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TTI TERACTION TRAP SYSTEM FOR ISO LATING NOVEL PROTEINS 
Background of the Inv ention 
This invention was made with Government support 
5 awarded by the National Institute of Health. The 
government has certain rights in the invention. This 
invention relates to methods for isolating novel 
proteins. This invention also relates to cancer 
diagnostics and therapeutics. 
10 In most eukaryotic cells, the cell cycle is 

governed by controls exerted during Gl and G2. During 
G2, cells decide whether to enter M in response to 
relatively uncharacterized intracellular signals, such as 
those that indicate completion, of DNA synthesis (Nurse, 
15 Nature 344:503-508, 1990; Enoch and Nurse, Cell 65:921- 
923, 1991). During Gl, cells either enter S or withdraw 
from the cell cycle and enter a nondividing state known 
as GO (Pardee, Science 246:603-608, 1989). While the 
control mechanisms for these decisions are not yet well 
20 understood, their function is clearly central to 
processes of normal metazoa development and to 
carcinogenesis. 

In yeast, and probably in all eukaryotes, the Gl/S 
and G2/M transitions depend on a family of -34kd protein 
25 kinases, the Cdc2 proteins, encoded by the cdc2 + (in S. 
pombe) and CDC28 (in S. cerevisiae) genes. Cdc2 family 
proteins from mammalian cells have been also identified. 
Some including Cdc2 (Lee and Nurse, Nature 327:31-35, 
1987), Cdk2 (Elledge and Spotswood, EMBO J. 10:2653-2659, 
30 1991; Tsai et al.. Nature 353:174-177, 1991), and Cdk3 
(Meyerson et al., EMBO J. 11:2909-2917, 1992) can 
complement a cdc2B~ S. cerevisiae for growth. 

The activity of the Cdc2 proteins at the G2/M 
transition point is regulated in two ways: positively, by 
35 association with regulatory proteins called cyclins, and 
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negatively, by phosphorylation of a tyrosine near their 
ATP binding site. At least one of these regulatory 
mechanisms is operative during Gl (see Figure 1A) . At 
this time, Cdc2 protein activity is regulated by 
5 facultative association with different Gl specific 
cyclins. In S. cerevisiae at least five putative Gl 
cyclins have been identified in genetic screens, 
including the products of the CLN1, CLN2, CLN3, HSC26 and 
CLB5 genes (Cross, Mol. Cell. Biol 8:4675-4684, 1988; 
10 Nash et al., EMBO J. 7:4335-4346, 1988; Hadwiger et al., 
Proc. Nat. Acad. Sci. USA 86:6255-6259, 1989; and Ogas et 
al., Cell 66:1015-1026, 1991). The CLN1, CLU2, and CLN3 
proteins (here called Clnl, Cln2, and Cln3) are each 
individually sufficient to permit a cell to make the Gl 
15 to S transition (Richardson et al., Cell 59:1127-1133, 
1989), and at least one of them (Cln2) associates with 
Cdc28 in a complex that is active as a protein kinase 
(Wittenberg et al. , Cell 62:225-237, 1990). Recently, 
putative Gl cyclins have been identified in mammalian 
20 cells: cyclin C, Cyclin D (three forms), and Cyclin E 

(Koff et al., Cell 66:1217-1228, 1991; Xiong et al., Cell 
65:691-699, 1991). Each of these three mammalian cyclins 
complement a yeast deficient in Clnl, Cln2, and Cln3, and 
each is expressed during Gl, 
25 In S. cerevisiae, the synthesis, and in some 

cases, the activity of the Gl cyclins is under the 
control of a network of genes that help to couple changes 
in the extracellular environment to Gl regulatory 
decisions (Figure 1A) . For example, the SP/J4 and SWI6 
30 gene products positively regulate CLN1 and CLN2 
transcription and may also positively modulate the 
activity of Cln3 (Nasmyth and Dirick, Cell 66:995-1013, 
1991) , the FAR1 product negatively regulates both CLN2 
transcription and the activity of its product (Chang and 
35 Herskowitz, Cell 63:999-1011, 1990), and the FUS3 product 
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negatively regulates Cln3 activity (Elion et al. , Cell 
60:649-664, 1990). 

Several lines of evidence suggest that mammalian 
Gl to S transitions may be regulated by similar 
5 mechanisms: regulatory molecules (Cdc2 kinases and 

cyclins) similar to those found in yeast are observed in 
mammalian Gl, and like S. cerevisiae, mammalian cells 
arrest in Gl when deprived of nutrients and in response 
to certain negative regulatory signals, including contact 

10 with other cells or treatment with negative growth 
factors (e.g., TGF-J3) (Figure IB) . However, several 
considerations suggest that the higher eukaryotic Gl 
regulatory machinery is likely to be more sophisticated 
than that of yeast. First, iri mammalian cells there 

15 appear to be more proteins involved in the process. At 
least ten different Cdc2 family proteins and related 
protein kinases (see Meyerson et al., EMBO J. 11:2909- 
2917, 1992) and at least three distinct classes of 
putative Gl cyclins (Koff et al., Cell 66:1217-1228, 

20 1991; Matsushime et al., Cell 65:701-713, 1991; Motokura 
et al., Nature 339:512-518, 1991; Xiong et al., Cell 
65:691-699, 1991) have been identified. Second, unlike 
yeast, the proliferation of most mammalian cells depends 
on extracellular protein factors (in particular, positive 

25 growth regulatory proteins) , deprivation of which leads 
to arrest in Gl. Third, arrest of many cell types during 
Gl can progress to a state, -GO, that may not strictly 
parallel any phase of the yeast cell cycle. 

Because proteins involved in controlling normal 

30 cell division decisions in mammals (e.g., humans) are 
also very likely to play a key role in malignant cell 
growth, identification and isolation of such proteins 
facilitate the development of useful cancer diagnostics 
as well as anti-cancer therapeutics. We now describe (i) 

35 a novel system for the identification of proteins which, 
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• at some time during their existence, participate in a 
particular protein-protein interaction; (ii) the use of 
this system to identify interacting proteins which are 
key regulators of mammalian cell division; and (iii) one 
5 such interacting protein, termed Cdil, a cell cycle 
control protein which provides a useful tool for cancer 

diagnosis and treatment. 

summar y of the Inv ention 
In general, the invention features a method for 
10 determining whether a first protein is capable of 
physically interacting (i.e., directly or indirectly) 
with a second protein. The method involves: (a) 
providing a host cell which contains (i) a reporter gene 
operably linked to a protein binding site; (ii) a first 
15 fusion gene which expresses a first fusion protein, the 
first fusion protein including the first protein 
covalently bonded to a binding moiety which is capable of 
specifically binding to the protein binding site; and 
(iii) a second fusion gene which expresses a second 
20 fusion protein, the second fusion protein including the 
second protein covalently bonded to a weak gene 
activating moiety; and (b) measuring expression of the 
reporter gene as a measure of an interaction between the 
first and the second proteins. In a preferred 
25 embodiment, the method further involves isolating the 
gene encoding the second protein. 

In other preferred embodiments, the weak gene 
activating moiety is of lesser activation potential than 
GAL4 activation region II and preferably is the gene 
30 activating moiety of B42 or a gene activating moiety of 
lesser activation potential; the host cell is a yeast 
cell; the reporter gene includes the LEU2 gene or the 
lacZ gene; the host cell further contains a second 
reporter gene operably linked to the protein binding 
35 site, for example, the host cell includes both a LE02 
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reporter gene and a lacZ reporter gene; the protein 
binding site is a LexA binding site and the binding 
moiety includes a LexA DNA binding domain; the second 
protein is a protein involved in the control of 
5 eukaryotic cell division, for example, a Cdc2 cell 
division control protein. 

In a second aspect, the invention features a 
substantially pure preparation of Cdil polypeptide. 
Preferably, the Cdil polypeptide includes an amino acid 

10 sequence substantially identical to the amino acid 

sequence shown in Figure 6 (SEQ ID NO: 1) ; and is derived 
from a mammal, for example, a human. 

In a related aspect, the invention features 
purified DNA (for example, cDN£) which includes a 

15 sequence encoding a Cdil polypeptide, and preferably a 
human Cdil polypeptide, of the invention. 

In other related aspects, the invention features a 
vector and a cell which includes a purified DNA of the 
invention; a purified antibody which specifically binds a 

20 Cdil polypeptide of the invention; and a method of 
producing a recombinant Cdil polypeptide invloving, 
providing a cell transformed with DNA encoding a Cdil 
polypeptide positioned for expression in the cell; 
culturing the transformed cell under conditions for 

25 expressing the DNA; and isolating the recombinant Cdil 
polypeptide. The invention further features recombinant 
Cdil polypeptide produced by such expression of a 
purified DNA of the invention. 

In yet another aspect, the invention features a 

30 therapeutic composition which includes as an active 
ingredient a Cdil polypeptide of the invention, the 
active ingredient being formulated in a physiologically- 
acceptable carrier. Such a therapeutic composition is 
useful in a method of inhibiting cell proliferation in a 

35 mammal, involving administering the therapeutic 
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composition to the mammal in a dosage effective to 
inhibit mammalian cell division. 

In a final aspect, the invention features a method 
of detecting a malignant cell in a biological sample, 
5 involving measuring Cdil gene expression in the sample, a 
change in Cdil expression relative to a wild-type sample 
being indicative of the presence of the malignant cell. 

As used herein, by "reporter gene" is meant a gene 
whose expression may be assayed; such genes include, 
10 without limitation, lacZ, amino acid biosynthetic genes, 
e.g. the yeast LEU2, HIS3, LYS2 , or URA3 genes, nucleic 
acid biosynthetic genes, the mammalian chloramphenicol' 
transacetylase (CAT) gene, or any surface antigen gene 
for which specific antibodies .are available. 
15 By "operably linked" is meant that a gene and a 

regulatory sequence (s) are connected in such a way as to 
permit gene expression when the appropriate molecules 
(e.g., transcriptional activator proteins or proteins 
which include transcriptional activation domains) are 
20 bound to the regulatory sequence (s) . 

By a "binding moiety" is meant a stretch of amino 
acids which is capable of directing specific polypeptide 
binding to a particular DNA sequence (i.e., a "protein 

binding site") . 

25 By "weak gene activating moiety" is meant a 

stretch of amino acids which is capable of weakly 
inducing the expression of a gene to whose control region 
it is bound. As used herein, "weakly" is meant below the 
level of activation effected by GAL4 activation region II 

30 (Ma and Ptashne, Cell 48:847, 1987) and is preferably at 
or below the level of activation effected by the B42 
activation domain of Ma and Ptashne (Cell 51:113, 1987). 
Levels of activation may be measured using any downstream 
reporter gene system and comparing, in parallel assays, 

35 the level of expression stimulated by the GAL4 region II- 
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polypeptide with the level of expression stimulated by 
the polypeptide to be tested. 

By "substantially pure" is meant a preparation 
which is at least 60% by weight (dry weight) the compound 
5 of interest, e.g., a Cdil polypeptide. Preferably the 
preparation is at least 75%, more preferably at least 
90%, and most preferably at least 99%, by weight the 
compound of interest. Purity can be measured by any 
appropriate method, e.g., column chromatography , 

10 polyacrylamide gel electrophoresis, or HPLC analysis. 

By "purified DNA" is meant DNA that is not 
immediately contiguous with both of the coding sequences 
with which it is immediately contiguous (one on the 5' 
end and one on the 3 ' end) in ,the naturally occurring 

15 genome of the organism from which it is derived. The 
term therefore includes, for example, a recombinant DNA 
which is incorporated into a vector; into an autonomously 
replicating plasmid or virus; or into the genomic DNA of 
a prokaryote or eukaryote, or which exists as a separate 

20 molecule (e.g., a cDNA or a genomic DNA fragment produced 
by PCR or restriction endonuclease treatment) independent 
of other sequences. It also includes a recombinant DNA 
which is part of a hybrid gene encoding additional 
polypeptide sequence. 

25 By "substantially identical" is meant an amino 

acid sequence which differs only by conservative amino 
acid substitutions, for example, substitution of one 
amino acid for another of the same class (e.g., valine 
for glycine, ar gin ine for lysine, etc.) or by one or more 

30 non-conservative substitutions , deletions, or insertions 
located at positions of the amino acid sequence which do 
not destroy the function of the protein (assayed, e.g., 
as described herein) . A "substantially identical" 
nucleic acid sequence codes for a substantially identical 

35 amino acid sequence as defined above. 
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By "transformed cell" is meant a cell into which 
(or into an ancestor of which) has been introduced, by 
means of recombinant DNA techniques, a DNA molecule 
encoding (as used herein) a Cdil polypeptide. 
5 By "positioned for expression" is meant that the 

DNA molecule is positioned adjacent to a DNA sequence 
which directs transcription and translation of the 
sequence (i.e., facilitates the production of, e.g., a 
Cdil polypeptide) . 

10 By "purified antibody" is meant antibody which is 

at least 60%, by weight, free from the proteins and 
naturally-occurring organic molecules with which it is 
naturally associated. Preferably, the preparation is at 
least 75%, more preferably at, least 90%, and most 

15 preferably at least 99%, by weight, antibody, e.g., Cdil- 
specific antibody. A purified Cdil antibody may be 
obtained, for example, by affinity chromatography using 
recombinantly-produced Cdil polypeptide and standard 
techniques. 

20 By "specifically binds" is meant an antibody which 

recognizes and binds Cdil polypeptide but which does not 
substantially recognize and bind other molecules in a 
sample, e.g., a biological sample, which naturally 
includes Cdil polypeptide. 

25 By a "malignant cell" is meant a cell which has 

been released from normal cell division control. 
Included in this definition are transformed and 
immortalized cells. 

The interaction trap system described herein 
30 provides advantages over more conventional methods for 

isolating interacting proteins or genes encoding 
j interacting proteins. Most notably, applicants' system 
provides a rapid and inexpensive method having very 
general utility for identifying and purifying genes 
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encoding a wide range of useful proteins based on the 
protein's physical interaction with a polypeptide of 
known diagnostic or therapeutic usefulness. This general 
utility derives in part from the fact that the components 
5 of the system can be readily modified to facilitate 
detection of protein interactions of widely varying 
affinity (e.g., by using reporter genes which differ 
quantitatively in their sensitivity to a protein 
interaction) . The inducible nature of the promoter used 

10 to express the interacting proteins also increases the 
scope of candidate interactors which may be detected 
since even proteins whose chronic expression is toxic to 
the host cell may be isolated simply by inducing a short 
burst of the protein's expression and testing for its 

15 ability to interact and stimulate expression of a 0- 
galactosidase reporter gene. 

Moreover, detection of interacting proteins 
through the use of a weak gene activation domain tag 
avoids the restrictions on the pool of available 

20 candidate interacting proteins which is 

characteristically associated with stronger activation 
domains (such as GAL4 or VP16) ; although the mechanism is 
unclear, such a restriction apparently results from low 
to moderate levels of host cell toxicity mediated by the 

25 strong activation domain. 

Other features and advantages of the invention 
will be apparent from the following detailed description 
thereof, and from the claims. 

Brief Description of the Drawings 

30 The drawings are first briefly described. 

FIGURE 1 illustrates cell cycle control systems. 
FIGURE 1(A) illustrates Gl control in yeast, FIGURE IB 
illustrates cell cycle control in yeast and mammals. 

FIGURE 2 A-C illustrates an interaction trap 

35 system according to the invention. 
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FIGURE 3A is a diagrammatic representation of a 
"bait" protein useful in the invention; the numbers 
represent amino acids, FIGURE 3B is a diagrammatic 
representation of reporter genes useful in the invention. 
5 FIGURE 3C is a diagrammatic representation of a library 
expression plasmid useful in the invention and the N- 
terminal amino acid sequence of an exemplary "prey" 
protein according to the invention. 

FIGURE 4 depicts yeast assays demonstrating the 
10 specificity of the Cdil/Cdc2 interaction. 

FIGURE 5 shows the results of an 
immunoprecipitation experiment demonstrating that Cdil 
physically interacts with Cdc2. 

FIGURE 6 shows the CdiJ. coding sequence together 
15 with the predicted amino-acid sequence of its open 
reading frame (SEQ ID NO:l). 

In FIGURE 7A, the growth rates of yeast cells that 
express Cdil are depicted; open squares are cells 
transformed with expression vectors only; ovals are cells 
20 expressing Cdc2; triangles are cells expressing Cdil; and 
filled squares are cells expressing Cdil and Cdc2. In 
FIGURE 7B is shown a budding index of yeast that express 
Cdil. In FIGURE 7C is shown a FACS analysis of yeast 
that express Cdil; fluorescence (on the x-axis) is shown 
25 as a function of cell number (on the Y-axis) . 

FIGURE 8A shows the morphology of control cells; 
FIGURE 8B shows the morphology of control cells stained 
with DAPI; FIGURE 8C shows the morphology of cells 
expressing Cdil; and FIGURE 8D shows the morphology of 
30 cells expressing Cdil stained with DAPI. 

FIGURE 9A indicates the timing of Cdil expression 
in Hela cells; lanes represent different timepoints: (1) 
0h f (2) 3h, (3) 6h, (4) 9h, (5) 12h, (6) 15h, (7) 18h, 
(8) 21h, (9) 24h, and (10) 27h after release. FIGURE 9B 
35 shows the effect of Cdil over express ion. 
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FIGURE 10 shows an alignment of Cdc2 proteins and 
FUS3. Depicted is an alignment of the sequences of the 
bait proteins used herein. Amino acids are numbered as 
in human Cdc2. Abbreviations are as follows: HsCdc2, 
5 human Cdc2; HsCdk2, human Cdk2; ScCdc28, 5* cerevisiae 
Cdc28; DmCdc2 and DmCdc2c, the two Drosophila Cdc2 
isolates; and ScFus3, S. cerevisiae FUS3 . Residues shown 
in boldface are conserved between the Cdc2 family 
members; residues present in Fus3 are also shown in bold. 
10 Asterisks indicate potential Cdil contact points, i.e., 
amino acids that are conserved among human Cdc2, Cdk2, S. 
cerevisiae Cdc28, and Drosophila Cdc2, but that differ in 
Drosophila Cdc2c and in Fus3 . 

There now follows a description of one example of 

15 an interaction trap system and its use for isolating a 
particular cell division protein. This example is 
designed to illustrate, not limit, the invention. 

Detailed Description 
Applicants have developed an in vivo interaction 

20 trap system for the isolation of genes encoding proteins 
which physically interact with a second protein of known 
diagnostic or therapeutic utility. The system involves a 
eukaryotic host strain (e.g., a yeast strain) which is 
engineered to express the protein of therapeutic or 

25 diagnostic interest as a fusion protein covalently bonded 
to a known DNA binding domain; this protein is referred 
to as a "bait" protein because its purpose in the system 
is to "catch" useful, but as yet unknown or 
uncharacterized, interacting polypeptides (termed the 

30 "prey"; see below). The eukaryotic host strain also 

contains one or more "reporter genes", i.e., genes whose 
transcription is detected in response to a bait-prey 
interaction. Bait proteins, via their DNA binding 
domain, bind to their specific DNA site upstream of a 
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reporter gene; reporter transcription is not stimulated, 
however, because the bait protein lacks its own 
activation domain. 

To isolate genes encoding novel interacting 
5 proteins, cells of this strain (containing a reporter 
gene and expressing a bait protein) are transformed with 
individual members of a DNA (e.g., a cDNA) expression, 
library; each member of the library directs the synthesis 
of a candidate interacting protein fused to a weak and 
10 invariant gene activation domain tag. Those library- 
encoded proteins that physically interact with the 
promoter-bound bait protein* are referred to as "prey" 
proteins. Such bound prey proteins (via their activation 
domain tag) detectably activate the transcription of the 
15 downstream reporter gene and provide a ready assay for 
identifying particular cells which harbor a DNA clone 
encoding an interacting protein of interest. 

One example of such an interaction trap system is 
shown in Figure 2. Figure 2A shows a yeast strain 
20 containing two reporter genes, LexAop-LEU2 and iexAop- 
lacZ, and a const itutively expressed bait protein, LexA- 
Cdc2. Synthesis of prey proteins is induced by growing 
the yeast in the presence of galactose. Figure 2B shows 
that if the prey protein does not interact with the 
25 transcriptionally-inert LexA-fusion bait protein, the 
reporter genes are not transcribed; the cell cannot grow 
into a colony on leu" medium,, and it is white on Xgal 
medium because it contains no 0-galactosidase activity. 
Figure 2C shows that, if the prey protein interacts with 
30 the bait, then both reporter genes are active; the cell 
forms a colony on leu" medium, and cells in that colony 
have 0-galactosidase activity and are blue on Xgal 
medium. 

As described herein, in developing the interaction 
35 trap system shown diagrammatically in Figure 2, careful 
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attention was paid to three classes of components: (i) 
use of bait proteins that contained a site-specific DNA 
binding domain that was known to be transcriptionally 
inert; (ii) use of reporter genes that had essentially no 
5 basal transcription and that were bound by the bait 

protein; and (iii) use of library-encoded prey proteins, 
all of which were expressed as chimeras whose amino 
termini contained the same weak activation domain and, 
preferably, other useful moieties, such as nuclear 
10 localization signals. 

Each component of the system is now described in 
more detail. 
Bait Proteins 

The selection host strain depicted in Figure 2 
15 contains a Cdc2 bait and a DNA binding moiety derived 

from the bacterial LexA protein (see Figure 3 A) . The use 
of a LexA DNA binding domain provides certain advantages. 
For example, in yeast, the LexA moiety contains no 
activation function and has no known effect on 
20 transcription of yeast genes (Brent and Ptashne, Nature 
312:612-615, 1984; Brent and Ptashne, Cell 43:729-736, 
1985) . In addition, use of the LexA rather than the GAL4 
DNA-binding domain allows conditional expression of prey 
proteins in response to galactose induction; this 
25 facilitates detection of prey proteins which might be 
toxic to the host cell if expressed continuously. 
Finally, the use of LexA allows knowledge regarding the 
interaction between LexA and the LexA binding site (i»e., 
the LexA operator) to be exploited for the purpose of 
30 optimizing operator occupancy. 

The bait protein illustrated in Figure 3A also 
includes a LexA dimerization domain; this optional domain 
facilitates efficient LexA dimer formation. Because LexA 
binds its DNA binding site as a dimer, inclusion of this 
35 domain in the bait protein also optimizes the efficiency 
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of operator occupancy (Golemis and Brent, Mol. Cell Biol. 
12:3006-3014, 1992). 

LexA represents a preferred DNA binding domain in 
the invention. However, any other transcript ionally- 
5 inert or essentially transcriptionally- inert DNA binding 
domain may be used in the interaction trap system; such 
DNA binding domains are well known and include the DNA 
binding portions of the proteins ACE1 (CUP1) f lambda cl, 
lac repressor, jun fos, or GCN4. For the above-described 
10 reasons, the GAL4 DNA binding domain represents a 

slightly less preferred DNA binding moiety for the bait 
proteins . 

Bait proteins may be chosen from any protein of 
known or suspected diagnostic , or therapeutic importance. 

15 Preferred bait proteins include oncoproteins (such as 
myc, particularly the C-terminus of myc, ras, src, fos, 
and particularly the oligomeric interaction domains of 
fos) or any other proteins involved in cell cycle 
regulation (such as kinases, phosphatases, the 

20 cytoplasmic portions of membrane-associated receptors, 
and other Cdc2 family members). In each case, the 
protein of diagnostic or therapeutic importance would be 
fused to a known DNA binding domain as generally 
described for LexA-Cdc2. 

25 Reporters 

As shown in Figure 3B, one preferred host strain 
according to the invention contains two different 
reporter genes, the LEU2 gene and the lacZ gene, each 
carrying an upstream binding site for the bait protein. 

30 The reporter genes depicted in Figure 3B each include, as 
an upstream binding site, one or more LexA operators in 
place of their native Upstream Activation Sequences 
(UASs) . These reporter genes may be integrated into the 
chromosome or may be carried on autonomously replicating 

35 plasmids (e.g., yeast 2/i plasmids) . 
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A combination of two such reporters is pr ferred 
in the invention for a number of reasons. First, the 
LexAop-LEU2 construction allows cells that contain 
interacting proteins to select themselves by growth on 
5 medium that lacks leucine, facilitating the examination 
of large numbers of potential interactor protein- 
containing cells. Second, the LexAop-lacZ reporter 
allows LEU + cells to be quickly screened to confirm an 
interaction. And, third, among other technical 

10 considerations (see below) , the LexAop-LEU2 reporter 
provides an extremely sensitive first selection, while 
the LexAop-lacZ reporter allows discrimination between 
proteins of different interaction affinities. 

Although the reporter genes described herein 

15 represent a preferred embodiment of the invention, other 
equivalent genes whose expression may be detected or 
assayed by standard techniques may also be employed in 
conjunction with, or instead of, the LEU2 and lacZ genes. 
Examples of other useful genes whose transcription can be 

20 detected include amino acid and nucleic acid biosynthetic 
genes (such as yeast HIS 3 , URA3 , and LYS2) GAL1, E. coli 
galK (which complements the yeast GAL1 gene) , and the 
higher cell reporter genes CAT, GUS, and any gene 
encoding a cell surface antigen for which antibodies are 

25 available (e.g., CD4) . 
Prey proteins 

In the selection described herein, a fourth DNA 
construction was utilized which encoded a series of 
candidate interacting proteins, each fused to a weak 

30 activation domain (i.e., prey proteins). One such prey 
protein construct is shown in Figure 3C; this plasmid 
encodes a prey fusion protein which includes an invariant 
N-terminal moiety. This moiety carries, amino to carboxy 
terminal, an ATG for protein expression, an optional 

35 nuclear localization s quence, a weak activation domain 
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(i.e., the B42 activation domain of Ma and Ptashne; Cell 
51:113, 1987), and an optional epitope tag for rapid 
immunological detection of fusion protein synthesis- As 
described herein, a HeLa cDNA libraray was constructed, 
5 and random library sequences were inserted downstream of 
this N-terminal fragment to produce fusion genes encoding 
prey proteins. 

Prey proteins other than those described herein 
are also useful in the invention. For example, cDNAs may 

10 be constructed from any mRNA population and inserted into 
an equivalent expression vector. Such a library of 
choice may be constructed de novo using commercially 
available kits (e.g., from Stratagene, La Jolla, CA) or 
using well established preparative procedures (see, e.g., 

15 Current Protocols in Molecular Biology, New York, John 
Wiley & Sons, 1987). Alternatively, a number of cDNA 
libraries (from a number of different organisms) are 
publically and commercially available; sources of 
libraries include, e.g., Clontech (Palo Alto, CA) and 

20 Stratagene (La Jolla, CA) . It is also noted that prey 
proteins need not be naturally occurring full length 
polypeptides. For example, a prey protein may be encoded 
by a synthetic sequence or may be the product of a 
randomly generated open reading frame or a portion 

25 thereof. In one particular example, the prey protein 

includes only an interaction domain; such a domain may be 
useful as a therapeutic to modulate bait protein 
activity. 

Similarly, other weak activation domains may be 
30 substituted for the B42 portion of the prey molecule; 
such activation domains must be weaker than the GAL4 
activation region II moiety and preferably should be no 
stronger than B42 (as measured, e.g., by a comparison 
with GAL4 activation region II or B42 in parallel 0- 
35 galactosidase assays using lacZ reporter genes) ; such a 
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domain may, however, be weaker than B42. In particular, 
the extraordinary sensitivity of the LEU2 selection 
scheme (described above) allows even extremely weak 
activation domains to be utilized in the invention. 

5 Examples of other useful weak activation domains include 
B17, B112, and the amphipathic helix (AH) domains 
described in Ma and Ptashne (Cell 51:113, 1987), Ruden et 
al. (Nature 350:426-430, 1991), and Giniger and Ptashne 
(Nature 330:670, 1987). 

10 Finally, the prey proteins, if desired, may 

include other optional nuclear localization sequences 
(e.g. , those derived from the GAL4 or HXTa2 genes) or 
other optional epitope tags (e.g., portions of the c-myc 
protein or the flag epitope available from Immunex) . 

15 These sequences optimize the efficiency of the system, 
but are not absolutely required for its operation. In 
particular, the nuclear localization sequence optimizes 
the efficiency with which prey molecules reach the 
nuclear-localized reporter gene construct (s) , thus 

20 increasing their effective concentration and allowing one 
to detect weaker protein interactions; and the epitope 
tag merely facilitates a simple immunoassay for fusion 
protein expression. 

Those skilled in the art will also recognize that 

25 the above-described reporter gene, DNA binding domain, 
and gene activation domain components may be derived from 
any appropriate eukaryotic or prokaryotic source, 
including yeast, mammalian cell, and prokaryotic cell 
genomes or cDNAs as well as artificial sequences. 

30 Moreover, although yeast represents a preferred host 

organism for the interaction trap system (for reasons of 
ease of propagation, genetic manipulation, and large 
scale screening) , other host organisms such as mammalian 
cells may also be utilized. If a mammalian system is 

35 chosen, a preferred reporter gene, is the sensitive and 
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easily assayed CAT gene; useful DNA binding domains and 
gene activation domains may be chosen from those 
described above (e.g., the LexA DNA binding domain and 
the B42 or B112 activation domains) . 
5 The general type of interaction trap system 

described herein provides a number of advantages. For 
example, the system can be used to detect bait-prey 
interactions of varying affinity. This can be 
accomplished, e.g., by using reporter genes which differ 
10 quantitatively in their sensitivity to an interaction 
with a library protein. In particular, the equilibrium 
Kd with which a library-encoded protein must interact 
with the bait to activate the LexAop-LEU2 reporter is 
probably <1<T 6 M. This value is clearly sufficient to 
15 detect protein interactions that are weaker and shorter 
lived than those detected, e.g., by typical physical 
methods. The lacZ reporters are less sensitive, allowing 
the selection of different prey proteins by utilizing 
reporters with the appropriate number, affinity, and 
20 position of LexA operators; in particular, sensitivity of 
the lacZ reporter gene is increased by either increasing 
the number of upstream LexA operators, utilizing LexA 
operators which have increased affinity for LexA binding 
dimers, and/ or decreasing the distance between the LexA 
25 operator and the downstream reporter gene promoter. This 
ability to manipulate the sensitivity of the system 
provides a measure of control over the strength of the 
interactions detected and thus increases the range of 
proteins which may be isolated. 
30 The system provides at least three other 

advantages. First, the activation region on the library- 
encoded proteins is relatively weak, in order to avoid 
restrictions on the spectrum of library proteins 
detected; such restrictions are common when utilizing a 
35 strong, semi-toxic activation domain such as that of GAL4 
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or VP16 (Gill and Ptashne, Nature 334:721-724, 1988; 
Triezenberg et al., Genes Dev. 2:730-742, 1988; Berger et 
al., Cell 70:251-265, 1992). Second, the use of LexA to 
bind the bait to DNA allows the use of GAL4 + yeast hosts 
5 and the use of the GAL1 promoter to effect conditional 
expression of the library protein. This in turn allows 
the Leu or lacZ phenotypes to be unconditionally ascribed 
to expression of the library protein and minimizes the 
number of false positives; it also allows conditional 

10 expression and selection of interactor proteins which are 
toxic to the host cell if continuously produced. And 
third, placing the activation domain at the amino 
terminus, rather than at the carboxy terminus, of the 
fusion protein guarantees that t** e activation domain 

15 portion of the protein will be translated in frame, and 
therefore that one out of three fusion genes will encode 
a candidate activation domain-tagged interactor protein. 

One particular interaction trap system is now 
described. The use of this system to isolate a protein 

20 (termed Cdil) which physically interacts with a known 
cell division control protein (termed Cdc2) is also 
illustrated. 

Isolation and Characterization of Cdil 
Isolation of th e Cdil cDNA 

25 To isolate proteins which interact with the cell 

division control protein Cdc2, the yeast strain 
EGY48/pl840 was utilized. This strain contained both the 
LexAop-LEU2 and LexAop-lacZ reporters, as well as a 
plasmid that directed the synthesis of a LexA-Cdc2 bait 

30 protein (see below) . The LexAop-LEU2 reporter replaced 
the chromosomal LEU2 gene. This reporter carried 3 
copies of the high affinity colEl double LexA operator 
(Ebina et al., J. Biol. Chem. 258:13258-13261, 1983) 40 
nucleotides upstream of the major LE02 transcription 

35 startpoint. The LexAop-lacZ reporter (pl840) was carried 
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on a URA3+ 2m plasmid. This reporter carried a single 
LexA operator 167 nucleotides upstream of the major GAL1 
transcription startpoint. 

A HeLa cDNA interaction library (described below) 
5 was also introduced into this strain using the plasmid 
depicted in Figure 3C (termed pJG4-5) ; this library 
vector was designed to direct the conditional expression 
of proteins under the control of a derivative of the GAL1 
promoter. This plasmid carried a 2/i replicator and a 

10 TRP1+ selectable marker. cDNA was inserted into this 
plasmid on EcoRl-XhoI fragments. Downstream of the Xhol 
site, pJG4-5 contained the M>H1 transcription terminator. 
The sequence of an invariant 107 amino acid moiety, 
encoded by the plasmid and fu$ed to the N-terminus of all 

15 library proteins, is shown below the plasmid map in 
Figure 3C. This moiety carries, amino to carboxy 
terminal, an ATG, the SV40 T nuclear localization 
sequence (Kalderon et al., Cell 39:499-509, 1984), the 
B42 transcription activation domain, (Ma and Ptashne, 

20 Cell 51:113-119, 1987; Kuden et al., Nature 350:426-430, 
1991) and the 12CA5 epitope tag from the influenza virus 
hemagglutinin protein (Green et al., Cell 28:477-487, 
1982). 

Following introduction of the prey-encoding 
25 plasmids into EGY48/pl840, over a million transf ormants 
were isolated, of which 3-4 X 10 5 expressed fusion 
proteins (see experimental procedures below) . The 
colonies were pooled, diluted, and grown for five hours 
in liquid culture in the presence of galactose to induce 
30 synthesis of library-encoded proteins. The pool was then 
diluted again so that each original transformant was 
represented about 20 times and plated on galactose- 
containing medium without leucine. From about 2 X 10 7 
cells, 412 LEU2* colonies were isolated. 55 of these 
35 colonies were blue on galactose Xgal medium, presumably 
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due to the lower sensitivity of the lacZ reporter. In 
all cells in which both reporters were active, both 
phenotypes were galactose-dependent, confirming that they 
required the library-encoded protein. Library plasmids 
5 were rescued from these cells, assigned to one of three 
classes by restriction mapping, and the plasmids 
identified from each class that contained the longest 
cDNA inserts. Synthesis of a fusion protein by the 
plasmid was verified in each case by Western blot 

10 analysis using anti-epitope antiserum. 

Further analysis by detailed mapping and partial 
DNA sequencing showed that two of the recovered cDNA 
classes were identical to previously identified genes 
encoding CKSlhs and CKS2hs (Richardson et al., Genes Dev. 

15 4:1332-1344, 1990), human homologs of the S. pombe sucl + 
product. Sequencing of the third restriction map class 
showed it to be a previously unidentified gene. This 
gene was termed CDI1, for Cdc2 Interactor 1; its protein 
product was termed Cdil. 

20 The CDI1 gene was introduced into a panel of 

EGY48-derived strains (i.e., EGY48/1840 containing 
different LexA fusion baits) in order to test the 
reproducibility and specificity of the interaction 
between Cdc2 and Cdil. Cells from 8 individual 

25 transformed cells that contained Cdil plus a given bait 
(horizontal streaks) or the same bait plus the library 
vector as a control (adjacent vertical streaks) were 
streaked with toothpicks onto each of three plates 
(Figure 4) . The plates, shown in Figure 4, included a 

30 "control" plate, a Ura" Trp" His" glucose plate which 
selected for the presence of the bait plasmid, the 
LexAop-lacZ reporter, and the Cdil expression plasmid; a 
"glucose" plate, a Ura" Trp" His" Leu" glucose plate, 
which additionally selected for activation of the LexAop- 

35 LEU2 reporter; and a "galactose" plate, a Ura" Trp" His" 



WO 94/10300 



PCT/US93/ 10069 



- 22 - 

Leu" galactose plate, which selected for the activation 
of the LexAop-LEU2 reporter, and which induced the 
expression of Cdil. Baits used in this test included: 
(1) LexA-Cdc2, (2) LexA-Bicoid, (3) LexA-Max, (4) LexA- 
5 Cln3, (5) LexA-Fus3 , and (6) LexA-cMyc-Cterm (Figure 4) . 

As judged by the LEU2 and lacZ transcription 
phenotypes, Cdil interacted specifically with LexA-Cdc2, 
and did not interact with LexA-cMyc-Cterm, LexA-Max, 
LexA-Bicoid, LexA-Cln3, or LexA-Fus3 (Figure 4)- Cdil 

10 also interacted with other Cdc2 family proteins, 

including LexA-Cdc28, as discussed below. Applicants 
also note that, on glucose, the LexA-Cln3 bait weakly 
activated the LexAop-LEU2 reporter, but that, on 
galactose, the inferiority of the carbon source and the 

15 diroished bait expression from the ADH1 promoter 
eliminated this background ♦ 

The specificity of the Cdil/Cdc2 interaction was 
then confirmed by physical criteria, in particular, by 
immunoprecipitation experiments. Extracts were made from 

20 EGY48 cells that contained a library plasmid that 
directed the synthesis of tagged Cdil and that also 
contained either a LexA-Cdc2 or a LexA-Bicoid bait. 

In particular, 100 ml of cells were grown in 
glucose or galactose medium (in which Cdil expression was 

25 induced) to an OD 600 of 0.6-0.8, pelleted by 

centrifugation, resuspended in 500/il RIPA, lysed by 
beating with glass beads five times for two minutes each, 
and spun twice for five minutes in a microfuge (10,000 X 
G) at 4° to remove the beads and cell debris. 5m1 of 

30 this supernatant was taken as a control, and 15/il of 
rabbit anti-LexA antiserum was added to the remainder, 
which was incubated at 4°C for four hours on a rotating 
platform. LexA-containing proteins were first 
precipitated from this remainder with 50^1 Staph A-coated 
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sepharose beads (Pharmacia, Fiscataway, NJ) as described 
in Wittenberg and Reed (Cell 54:1061-1072, 1988). The 
entire pellet was then dissolved in Laemmli sample 
buffer, run on a 12.5% protein gel (SDS/PAGE) , and 
5 blotted onto nitrocellulose. Tagged Cdil fusion proteins 
were identified by Western analysis of the blotted 
proteins with the 12CA5 monoclonal antihemagglutinin 
antibody essentially as described in Samson et al. (Cell 
57:1045-1052, 1989). 

10 The results are shown in Figure 5; the lanes are 

as follows: (1) Galactose medium, LexA-Bicoid bait, 
immunoprecipitation; (2) Glucose medium, LexA-Bicoid 
bait, immunoprecipitation; (3) Galactose medium, LexA- 
Bicoid bait, cell extract; (4) Glucose medium, LexA- 

15 Bicoid bait, cell extract; (5) Galactose medium, LexA- 
Cdc2 bait, immunoprecipitation; (6) Glucose medium, LexA- 
Cdc2 bait, immunoprecipitation; (7) Galactose medium, 
LexA-Cdc2 bait, cell extract; and (8) Glucose medium, 
LexA-Cdc2 bait, cell extract. As shown in Figure 5, 

20 anti-LexA antiserum precipitated Cdil from a yeast 

extract that contained LexA-Cdc2 and Cdil, but not from 
one that contained LexA-Bicoid and Cdil, thus confirming 
that Cdil physically interacted only with the Cdc2- 
containing bait protein. 

25 The Cdil Protein Product 

To analyze the Cdil protein product, the Cdil cDNA 
was isolated from 12 different library plasaids that 
contained cDNAs of 4 different lengths. Sequence 
analysis revealed that all of the cDNA inserts contained 

30 an open reading frame, and inspection of the sequence of 
the longest cDNAs (Figure 6) revealed an ATG with a 
perfect match to the Kozak consensus translation 
initiation sequence (PuCC/GATGG) (Kozak, Cell 44:283-292, 
1986) . Careful analysis of the size of the Cdil mRNA in 

35 HeLa cells revealed that this ATG occurred between 15 and 
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45 nucleotides from the 5' end of the Cdil message, 
suggesting that the longest cDNAs spanned the entire open 
reading frame. 

The Cdil gene is predicted to encode a protein of 

5 212 amino acids. The Cdil amino acid sequence does not 
reveal compelling similarities to any previously 
identified proteins (Figure 6) . However, two facts about 
the protein sequence are worth noting. First, 19 of the 
amino-terminal 35 amino acids are either proline, 

10 glutamic acid, serine, or threonine. Proteins that 
contain these stretches, called PEST sequences, are 
thought to be degraded rapidly (Rogers et al., Science 
234:364-368, 1986); in fact, this stretch of Cdil is more 
enriched in these amino acids than the C-termini of the 

15 yeast Gl cyclins, in which the PEST sequences are known 
to be functional (Cross, Mol. Cell. Biol 8:4675-4684, 
1988; Nash et al. , EMBO J, 7:4335-4346, 1988; Hadwiger et 
al., Proc. Nat. Acad. Sci. USA 86:6255-6259, 1989). 
Second, since the cDNA library from which the plasmids 

20 that encoded Cdil were isolated was primed with oligo dT, 
and since all isolated Cdil cDNAs by definition encoded 
proteins that interacted with Cdc2, analysis of the sizes 
of Cdil cDNA inserts obtained in the screen necessarily 
localized the portion of the protein sufficient for 

25 interaction with Cdc2 to Cdil's C-terminal -170 amino 
acids. 

Analysis of Cdil Function in Yeast 

In initial efforts to understand Cdil function, 
the effects of Cdil expression in yeast were examined. 
30 In particular, because Cdil interacts with Cdc2 family 
proteins, including S. cerevisiae Cdc28, an examination 
of whether Cdil affected phenotypes that depended on 
other known proteins that interact with Cdc28 was 
undertaken. 
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Toward this end f the fact that expression of the 
5. pombe sucl* or S. cerevisiae Cks proteins can rescue 
the temperature sensitivity of strains that bear certain 
cdc28 ts alleles was exploited; this effect is thought to 
5 be due to the ability of these proteins to form complexes 
with the labile Cdc28 tfl protein, protecting it against 
thermal denaturation (Hadwiger et al., Proc. Nat. Acad. 
Sci. USA 86:6255-6259, 1989). It was found that Cdil 
expression did not rescue the temperature-sensitivity of 
10 any cdc28 allele tested, although human Cks2 did. 

Next, the ability of Cdil to confer on yeast 
either of two phenotypes associated with expression of S. 
cerevisiae or higher eukaryotic cyclins was examined; 
such phenotypes include resistance to the arrest of MATa 
. 15 strains by a factor, and rescue of growth arrest of a 
strain deficient in Clnl, Cln2, and Cln3. Again, 
however, Cdil expression did not confer either phenotype. 

During initial studies, it was noted that 
expression of Cdil inhibited yeast cell cycle 
20 progression. Cultures of cells that expressed Cdil 
increased their cell number and optical density more 
slowly than control populations (Figure 7A) . 

To further investigate this growth retardation 
phenotype, the morphology of Cdil-expressing cells was 
25 examined. W303 cells were transformed with pJG4-4Cdil, a 
galactose-inducible vector that directs the synthesis of 
Cdil. Morphology of cells was examined with Nomarski 
optics at 1000X magnification. As shown in Figure 8, 
such microscopic examination of the cells showed that, 
30 compared with controls, cells in which Cdil was expressed 
were larger, and a subpopulation showed aberrant 
morphologies: 5% of the cells formed elongated schmoos, 
and 5% exhibited multiple buds. Immunof luorescent 
examination of a sample of these cells which had been 
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DAPI stained (as described below) showed that the nuclei 
of some of the largest cells were not condensed. 

Finally, cells were examined for their ability to 
bud. Samples of 400 cells from control populations and 
5 from populations expressing cdil were examined by phase 
contrast microscopy, and the budding index was calculated 
as the percentage of budded cells in each population as 
described in Wittenberg and Reed (Mol. Cell. Biol. 
9:4064-4068, 1989). As shown in Figure 7B, less than 10% 
10 of the cells in the Cdil-expressing population showed 
buds, as opposed to 30% of the cells in the control 
population, suggesting that fewer of the cells in the 
population expressing Cdil had passed through the Gl to S 
transition. This finding is cpnsistent with the idea 
15 that the increased cell size and growth retardation were 
also due to a prolongation of Gl. 

This hypothesis was further tested by FACS 
analysis of cellular DNA. In particular, W303 cells that 
contained Cdil were grown as described above and diluted 
20 to OD 600 =0.l in 2% glucose or 1% raffinose, 1% galactose, 
and grown to OD 600 =0.8-1.0. At this point, the cells were 
collected, sonicated, fixed in 70% ethanol, stained with 
propidium iodide, and subjected to FACS analysis to 
determine DNA content as previously described (Lew et al. 
25 Cell 63:317-328, 1992). Approximately 20,000 events were 
analyzed. These results, shown in Figure 7C, indicated 
that the majority of the cells in the Cdil-expressing 
population had increased amounts of cellular DNA. This 
may indicate that an increased number of cells were in S 
30 phase; alternatively, it may simply be the result of 

larger cell size and increased quantity of mitochondrial 
DNA. 

Taken together, these experiments thus indicated 
that protracted Cdil expression in S. cerevisiae caused a 
35 retardation in the passage of cells through the cell 
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cycle, most likely by increasing the proportion of cells 
in Gl; they thus also indicate that Cdil expression 
uncoupled the normal synchrony between these two metrics 
of cell cycle progression • 
5 Because Cdil interacts with Cdc2 family proteins, 

it was postulated that the Cdil growth retardation 
phenotype in 5. cerevisiae might be explained by 
sequestration of Cdc28 into protein complexes that were 
not competent to cause the cell to traverse Gl. To test 

10 this hypothesis, the effect of native Cdil expression in 
cells containing Cdc28 with and without overexpressed 
native human Cdc2 was compared* Specifically, W303 cells 
that carried the indicated combinations of galactose- 
inducible Cdil expression vector and/or Cdc2 expression 

15 vector were grown for 14h in complete minimal medium 
lacking tryptophan and histidine in the presence of 2% 
raffinose. Cells were then washed and diluted to 
OD 600 «0.1 in the same media containing either 2% glucose, 
or 1% raffinose and 1% galactose. Optical densities were 

20 measured at two hour intervals for 12 hours. The results 
of these growth assay experiments are shown in Figure 7A. 

Unexpectedly, it was found that the presence of 
additional Cdc2 increased the severity of the Cdil- 
dependent growth inhibition (Figure 7A) . This result 

25 suggested that Cdil endowed Cdc2 family proteins with a 
new function, at least in S. cerevisiae, one that 
inhibited their ability to cause cells to traverse Gl and 
S. The Cdil and Cdc2 expression plasmids together also 
caused some growth inhibition, even in glucose medium; 

30 this result was attributed to leaky expression from the 
GALl promoter on the expression plasmid. 
Analysis of Cdil Function in Mammalian Cells 

The above results in yeast suggested that Cdil 
might have a similar effect on the ability of mammalian 
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cells to traverse Gl or S. Since Cdil was isolated from 
HeLa cDNA, the point in the cell cycle at which Cdil mRNA 
was expressed in these cells was first measured. 
Specifically, adherent HeLa cells were 
5 synchronized in late Gl by a double thymidine block (Rao 
and Johnson, Nature 225:159-164, 1970) as described in 
Lew et al. (Cell 66:1197-1206, 1991). Aliquots of cells 
were collected every three hours after release from the 
block. Released cells reentered the cell cycle 9 hours 
10 after release, as measured by FACS analysis of DNA 
content. Total RNA was prepared from each aliquot at 
different time points, run out on a formaldehyde agarose 
gel, and blotted onto nylon (Nytran, Schleider and 
Schuell, Keene, NH) as described in Ausubel et al. 
15 ( current Protocols i n Molecular Biology. New York, John 
Wiley & Sons, 1987) . The blot was probed with random 
primed DNA probes (Feinberg and Vogelstein, Anal. 
Biochem. 132:6-13, 1983) made from a 690 bp EcoRI 
fragment that contained Cdil, a 1389 bp PstI fragment 
20 from of human cyclin E sequence (Lew et al., Cell 

66:1197-1206, 1991), a I228bp NcoI-SphI fragment from the 
coding sequence of the human Cyclin Bl gene (Pines and 
Hunder, Cell 58:833-846, 1989), and a 1268bp PstI 
fragment carrying the full length human glyceraldehyde- 
25 phosphate-dehydrogenase (GAPD) gene (Tokunaga et al., 
Cancer Res. 47:5616-5619, 1987) which served as a 
normalization control. As is shown in Figure 9A, 
expression of Cdil mRNA peaks at the end of Gl, 
immediately before the Gl to S transition, in parallel 
30 with the expression of the cyclin E message. This 
temporal expression pattern was consistent with the 
hypothesis that Cdil expression might affect the Gl to S 

transition. 

To further test this idea, HeLa cells were 
35 transfected either with pBNCdil, a construction that 
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directed the synthesis of Cdil under the control of the 
Moloney Murine Leukemia Virus LTR (see below) , or with 
the vector alone. Individual transformed clones were 
selected by their resistance to G418, and cells from 
5 these clones were stained with propidium-iodide and 
subjected to FACS analysis to determine DNA content (as 
described below) . The midpoint of Gl was defined as the 
mode of the distribution of each graph; the modes on the 
two panels were of different heights (272 counts for 
10 cells transformed with the vector, 101 counts for cells 
that contained Cdil) ; this broadened peak in the Cdil- 
expressing cells reflected the increased proportion of 
the population that contains approximately IX DNA 
content. 4 independent transf ectants were analysed; all 
15 yielded similar results. These results, which are shown 
in Figure 9B, indicated that the populations of cells in 
which Cdil was expressed contained an increased 
proportion of cells in Gl relative to control 
populations . 
20 Cdc2-Cdil Interaction 

To identify determinants of Cdc2 recognized by 
Cdil, Cdil was tested for its ability to interact with a 
panel of different bait proteins that included Cdc2 
proteins from yeast, humans, and flies, as well as the 
25 yeast Fus3 protein kinase (a protein kinase of the ERK 
class which negatively regulates Cln3 and which, by 
sequence criteria, is less related to the Cdc2 proteins 
than those proteins are to one another (Elion et al. , 
Cell 60:649-664, 1990). 
30 To perform these experiments, EGY48/JK103 

(described below) containing a plasmid that directed the 
galactose-inducible synthesis of tagged Cdil was 
transformed with one of a series of different 
transcriptionally-^inert LexA-Cdc2 family protein baits. 
35 Five individual transf ormants of each bait were grown to 
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OD 600 =0. 5-1.0 in minimal medium that contained 2% 
galactose but that lacked uracil, histidine, and 
tryptophan. Results are shown in Table 1 and are given 
in /9-galactosidase units; variation among individual 
5 transformants was less than 20%. 

TABLE 1 

Bait p-Galactoaidase Activity 

LexA-Cdc2 (Hs) 1580 
LexA-Cdk2 (Hs) 440 
10 LexA-Cdc28 (Sc) 480 
LexA-Cdc2 (Dm) 40 
LexA-Cdc2c (Dm) >2 
LexA-Fus3 (Sc) >2 

As shown in Table 1, tagged Cdil stimulated 

15 transcription from these baits to different levels; it 
activated strongly in strains that contained the human 
Cdc2 bait, against which it was selected, less strongly 
in strains that contained S. cerevisiae Cdc28 or human 
Cdk2 baits, and only weakly in strains that contained the 

20 DmCdc2 bait, one of the two Drosophila Cdc2 homologs 
(Jimenez et al., EMBO J. 9:3565-3571, 1990; Lehner and 
O'Farrell, EMBO J. 9:3573-3581, 1990). In strains that 
contained the DMCdc2c bait or Fus3, Cdil did not activate 
at all. Since baits in this panel were related in 

25 sequence, were made from the same vector, were translated 
from a message that, had the same 5' untranslated sequence 
and the same LexA coding sequence, and were expressed in 
yeast in the same amounts, the differences in 
transcription among the bait strains very likely 

30 reflected differences in interaction with the tagged 
Cdil. 

In order to identify residues on Cdc2 proteins 
that Cdil might recognize, the transcription interaction 
data was compared to the sequence of the baits. A lineup 
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of the bait sequenc s was searched for residues that were 
conserved in the proteins with which Cdil interacted, but 
which differed in the proteins that Cdil did not touch. 
Use of this criterion identified 7 residues, which are 
5 indicated by asterisks in Figure 10. Of these residues, 
two, Glu 57 and Gly 154 (in human Cdc2) , are altered in 
the non-interacting baits to amino acids of different 
chemical type. In DmCdc2c, residue 57 is changed from 
Glu to Asn, and residue 154 from Gly to Asn; in Fus3, 

10 these residues are changed to His and Asp. In human 
Cdc2, both of these residues adjoin regions of the 
molecule necessary for interaction with cyclins (Ducommun 
et al., Mol. Cell* Biol. 11:6177-6184, 1991). Projection 
of the human Cdc2 primary sequence on the crystal 

15 structure solved by Knighton et al. for bovine cAMP 
dependent protein kinase (Science 253:407-413, 1991) 
suggests that residues 57 and 154 are in fact likely to 
be close to these cyclin contact points in the folded 
protein. 

20 These results are thus consistent with the idea 

that Cdil may exert its effects by changing the affinity 
of Cdc2 proteins for particular cyclins, thus potentially 
altering their substrate specificity. 

In summary* Cdil is a protein which complexes with 

25 Cdc2 family proteins. It is expressed around the time of 
the Gl to S transition, and the above results suggest 
that it may negatively regulate passage of cells through 
this part of the cycle, thus linking the regulatory 
networks connecting extracellular signals with core cell 

30 cycle controls. If Cdil is in fact a negative regulator, 
it is interesting to note that its normal function may be 
to convey signals that retard or block the cell cycle 
during Gl. Since both normal differentiation and cancer 
can be considered consequences of changes in Gl 

35 regulation, this idea raises the possibilities that Cdil 
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may function to "remove cells from active cycle to allow 
differentiation (Pardee, Science 246:603-608, 1989); and 
that there are cancers in which lesions in the Gl 
regulatory machinery prevent Cdil from exerting its full 
5 effect, 

Ex perimental procedures 

Banteria and veast 

Manipulation of bacterial strains and of DNAs was 
by standard methods (see, e.g., Ausubel et al., Current 

10 protocols in Molecular Biology , New York, John Wiley & 
Sons, 1987; and Sambrook et al., Molecular Cloning: a 
Laboratory Manual , Cold Spring Harbor, NY, Cold Spring 
Harbor Laboratory, 1989) unless otherwise noted. E. coli 
"Sure" mcrA A(mrr, hsdBMS, mcrBC) endAl supE44 thi-1 

15 gyrA9€ relAl lac recB recJ sbcC umuC: :Tn5 (kan n ) uvrC 
/F'lproAB, lacI*ZA Hls ]z :TnlO(tet R ) (Stratagene Inc., 
LaJolla, CA) and KC8 (pyrF::Tn5 hsdR leuBSOO trpC9830 
lacA74 strA galK hisB436) were used as bacterial hosts 
throughout. 

20 To determine whether Cdil complemented either Gl 

or G2 functions of cdc28 f the following yeast strains 
were used: cdc28-lN (MATa ura3 adel trpl cdc28-lN) , which 
at the restrictive temperature arrests predominantly in 
G2; and cdc28-13 (MATa leu2 trpl his3 ura3 adel tyrl 

25 cdc28-13) and cdc28-17 (MATa leu2 trpl his3 ura3.metl4 
argS arg6 tyrl cdc28-17) , which at the restrictive 
temperature arrests predominantly during Gl. 

Into these strains was introduced pJG4-6Cdil (see 
below) , a yeast expression plasmid that directs the 

30 synthesis of Cdil that contains a hemagglutinin epitope 
tag at its amino terminus, and pJG4-7Cks2 (derived from 
the same selection) as a positive control. Overnight 
cultures of these strains were diluted 20:1 into trp~ 
complete minimal medium with 2% glucose and 2% galactose 
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and grown at 25°C for five hours. Dilutions of these 
cultures were plated onto duplicate plates of solid media 
that contained the same carbon sources; one plate was 
placed at 25°C and the other at 36°C. Colonies were 
5 counted after five days of incubation 

In order to determine whether Cdil complemented a 
strain deficient in Gl cyclins, strain 3C-1AX (MATa barl 
Acini Acln2 Acln3 cyh2 trpl leu2 ura2 a del h±s2 [pLEU2- 
CYH2 (CYH S )-CLN3+)) into which pJG4-7Cdil or a GAL1-CLN3 

10 construct as a positive control had been introduced was 
used. Overnight cultures were diluted into glucose and 
galactose medium as above, and grown for five hours at 
30 °C. Cells were plated onto glucose- and galactose- 
containing medium as above, except that the medium also 

15 contained 10/Lig/ial cyclohexamide; cells were grown for 
three days and counted. Colonies can only arise on this 
medium when the CYH S -CLN3 + plasmid is lost, an event which 
itself can only occur if the other plasmid rescues the 
Cln deficiency. 

20 The ability of Cdil to cause resistance to arrest 

by a factor was tested using a derivative of W303 (MATa 
trpl ura3 his3 leu2 canl barl: :LEU2) into which pJG4- 
4Cdil, a plasmid that directs the synthesis of native 
Cdil, had been introduced. Strain W303 was also 

25 transformed with a set of mammalian cDNAs that had been 
isolated by their ability to confer a factor resistance 
as a positive control. Overnight cultures were grown in 
glucose and galactose as above, and then plated on 
glucose and galactose medium, in the presence and absence 

30 of 10" 7 M a factor. Colonies were counted after 3 days. 

For the growth rate experiments, W303 contained 
either pJG4-4Cdil or a vector control, in combination 
with either a pJG14-2, a HJS3 + plasmid which directs the 
synthesis in yeast of native human Cdc2 under the control 

35 of the ADH1 promoter, or a vector control. Overnight 
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cultures which were grown in His" Trp" minimal medium that 
contained 2% raffinose were collected, washed, and 
diluted into fresh medium that contained either 2% 
glucose or 1% galactose + 1% raffinose to OD 600 =0.1. 
5 Growth kinetics were followed, measuring the OD of 
aliguots taken every 2 hours.. 

In order to optimize operator occupancy, baits 
were produced constitutively under the control of the 

10 ADHl promoter (Ammerer, Meth. Enzym. 101:192-210, 1983), 
and contained the LexA G-terminal oligomerization region, 
which contributes to operator occupancy by LexA- 
containing proteins, perhaps because it aids in the 
precise alignment of LexA amino termini of adjacent 

15 operator half sites (Golemis and Brent, Mol. Cell. Biol. 
12:3006-3014, 1992). It is worth noting that all LexA- 
bait proteins so far examined enter the yeast nucleus in 
concentrations sufficient to permit operator binding, 
even though LexA derivatives are not specifically 

20 localized to the nucleus unless they contain other 

nuclear localization signals (see, e.g., Silver et al. , 
Mol. Cell. Biol. 6:4763-4766, 1986) . 

pL202pl has been described (Ruden et al.. Nature 
350:426-430, 1991). This plasmid, a close relative of 

25 pMA424 and pSH2-l (Ma and Ptashne, Cell 51:113-119, 1987; 
Hanes and Brent, Cell 57:1275-1283, 1989) carries the 
HIS3 + marker and the 2m replicator, and directs the 
synthesis in yeast of fusion proteins that carry the 
wild-type LexA protein at their amino terminus. Baits 

30 used in this study were made as follows: human Cdc2 (Lee 
and Nurse, Nature 327:31-35, 1987), Cdk2 (Tsai et al., 
Nature 353:174-177, 1991) and the S. cerevisiae CDC28 
genes (Lorincz and Reed, Nature 307:183-185, 1984) were 
amplified by PCR using Vent polymerase (New England 

35 Biolabs, Beverley, MA) and cloned into pL202pl as EcoRI- 
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BamHI fragments . These proteins contained two amino 
acids (glu phe) inserted between the last amino acid of 
LexA and the bait proteins. The Drosophila Cdc2 (Jimenez 
et al., EMBO J. 9:3565-3571, 1990; Lehner and O'Farrell, 
5 EMBO J. 9:3573-3581, 1990) baits were cloned as BamHI- 
Sall fragments following PCR amplification- LexA-Fus3 
(Elion, Cell 60:649-664, 1990) and LexA-Cln3 (Cross, Mol. 
Cell, Biol 8:4675-4684, 1988, Nash et al., EMBO J. 
7:4335-4346, 1988) were made in a similar way except they 

10 were cloned as BamHI fragments. These plasmids contained 
five amino acids (glu phe pro gly ile) (SEQ ID NO:2) 
inserted between LexA and the baits. All these fusions 
contained the entire coding region from the second amino 
acid to the stop codon. LexA-cMyc-Cterm contained the 

15 carboxy-terminal 176 amino acids of human cMyc, and LexA- 
Max contained all of the human Max coding sequence. 
LexA-Bicoid (amino acid 2-160) has been described 
(Golemis and Brent, Mol. Cell. Biol. 12:3006-3014, 1992). 
Reporters 

20 In the interaction trap, one reporter, the XexAop- 

LEU2 construction, replaced the yeast chromosomal LEU2 
gene. The other reporter, one of a series of LexAop- 
GALl-lacZ genes (Brent and Ptashne, Cell 43:729-736, 
1985; Kamens et al., Mol. Cell. Biol. 10:2840-2847, 

25 1990) , was carried on a 2\i plasmid. The reporters were 
designed so that their basal transcription was extremely 
low, presumably due both to the removal of the entirety 
of the UAS from both reporters, and to the fact (whose 
cause is unknown) that LexA operators introduced into 

30 promoters tend to decrease transcription (Brent and 

Ptashne, Nature 312:612-615, 1984; Lech, Gene activation 
by DNA-bound Fos and Myc proteins. Ph.D. thesis, Harvard 
University, 1990) . Reporters were selected to differ in 
their response to activation by LexA fusion proteins. In 

35 this study, the LEU2 reporter contained three copies of 
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the high-affinity LexA binding site found upstream of E. 
coli colEl (Ebina et al., J. Biol. Chem. 258:13258-13261, 
1983; Kamens et al., Mol. Cell. Biol. 10:2840-2847, 
1990) , and thus presumably binds a total of 6 dimers of 
5 the bait. In contrast, the lacZ gene employed in the 
primary screen contained a single lower affinity 
consensus operator (Brent and Ptashne, Nature 312:612- 
615, 1984) which binds a single dimer of the bait. The 
LexA operators in the LEU2 reporter were closer to the 
10 transcription startpoint than they were in the lacZ 

reporter. These differences in the number, affinity, and 
position of the operators all contributed to making the 
LEU2 gene a more sensitive indicator than the lacZ gene, 
a property that is useful for this method. 
15 pl840 and pJK103 have been described (Brent and 

Ptashne, Cell 43:729-736, 1985, Kamens et al., Mol. Cell. 
Biol. 10:2840-2847, 1990). pHR33 (Ellerstrom et al., 
Plant Mol. Biol. 18:557-566, 1992) was cut with Hindlll 
and an -116 6bp fragment that contained the URA3+ gene 
20 from yEP24M13-2, a derivative of yEP24, was introduced 
into it to create pLEU2-0. This plasmid contains a Bglll 
site 87 nucleotides upstream of the major LEU2 
transcription startpoint. pLEU2-0 was cut with Bglll, 
and a 42bp double stranded Bglll-ended oligomer 
25 5 9 GATCCTGCTGTATATAAAACCAGTGGTTATATGTACAGTACG3 ' (SEQ ID NO 
3) 

3 ' GACGACATATATTTTGGTCACCAATATACATGTCATGCCTAG 5 ' ( SEQ 
ID NO:4) 

that contains the overlapping LexA operators found 
30 upstream of the colecin El gene (Ebina et al., J. Biol. 
Chem. 258:13258-13261, 1983) and which presumably binds 2 
LexA dimers, was introduced into it. One plasmid, pLEU2- 
LexAop6, that contained three copies of this oligomer was 
picked; it presumably binds 6 dimers of LexA fusion 
35 proteins. 
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Selection strains 

EGY12 (MATa trpl ura2 LEU2 ; :pLEU2-0 (AUASLEU2) ) 
and EGY38 (as above but : :pLEU2-LexAop6) were constructed 
as follows. pLEU2-0 and pLEU2-LexAop6 were linearized by 
5 digestion with Clal within the LEU2 gene, and the DNA was 
introduced into U457 (MAT a SUP53-a ade2-l canl-100 uraJ- 
52 trpl-1 [phi+D by lithium acetate transformation (Ito 
et al., J. Bacter. 153:163-168, 1983); ura + colonies, 
which presumably contained the plasmid DNA integrated 

10 into LEU2, were selected- Several of these trans formants 
were grown in YFD. Ura~ cells were selected by plating 
these cultures on medium that contained 5-FOA (Ausubel et 
al., current Protocols in Molecular Biology, New York, 
John Wiley & Sons, 1987) . Both plasmids carry a TY1 

15 element. For each integration, -some of the ura3" 

revertants were also trpl", suggesting that the URA3+ 
marker was deleted in a homologous recombination event 
that involved the TY1 sequences on the LED2 plasmids and 
the chromosomal TY1 element upstream of SUP53-a (Oliver 

20 et al,, Nature 357:38-46, 1992). Trp" colonies from each 
integration, EGY12 (no LexA operators) and EGY38 (6 
operators) were saved. These were mated to GG100-14D 
(MAS' a his3 trpl pho5) . The resulting diploids were 
sporulated, and a number of random (MATa leu2- ura3- 

25 trpl- his3- GAL+) spore products were recovered. EGY40 
and EGY48 are products of this cross; EGY40 has no LexA 
operators, EGY4 8 has 6. To make the bait strains, EGY48 
was transformed with pl840 or pJK103 and with the 
different bait plasmids. Double transf ormants were 

3 0 selected on Glucose Ura" His" plates, and expression of 
the bait protein confirmed by Western blotting using 
anti-LexA antibody and standard techniques. 
Library ("prey"! expression vectors 

Library-encoded proteins were expressed from pJG4- 

35 5, a member of a series of expression plasmids designed 
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to be used in the interaction trap and to facilitate 
analysis of isolated proteins. These plasmids all 
carried the 2/i replicator, to ensure high copy number in 
yeast, and the TRP1 marker* pJG4-5 was designed to 
5 possess the following features: a galactose-inducible 
promoter to allow conditional expression of the library 
proteins, an epitope tag to facilitate their detection, a 
nuclear localization signal to maximize their 
intranuclear concentration in order to increase the 

10 sensitivity of the selection, and a weak acid blob 
activation domain (Ma and Ptashne, Cell 51:113-119, 
1987) . This domain was chosen for two reasons: because 
its activity is not subject to known regulation by yeast 
proteins as is the major GAL4 .activation domain, and, 

15 more importantly, because it is a weak activator, 

presumably avoiding toxicity due to squelching or other 
mechanisms (Gill and Ptashne, Nature 334:721-724, 1988, 
Berger et al.. Cell 70:251-265, 1992) very likely to 
restrict the number or type of interacting proteins 

20 recovered. 

pJG4-5 was constructed as follows. An "expression 
cassette" containing the GAL1 promoter and the ADH1 
terminator and a 345 nt insert that encoded a 107 amino 
acid moiety was inserted into pJG4-0, a plasmid that 

25 carries the TRP1 gene, the 2ju replicator, the pUC13 

replication origin, and the ampicillin resistance gene. 
The pJG4-5 expression cassette directed the synthesis of 
fusion proteins, each of which carried at the amino 
terminus, amino to carboxy terminal, an ATG, an SV40 

30 nuclear localization sequence (PPKKKRKVA) (SEQ ID NO: 5) 
(Kalderon et al., Cell 39:499-509, 1984), the B42 acid 
blob transcriptional activation domain (Ma and Ptashne, 
Cell 51:113-119, 1987) and the HA1 epitope tag 
(YPYDVPDYA) (SEQ ID NO: 6) (Green et al. f Cell 28:477- 

35 487, 1980) (Figure 3C) . In addition to this plasmid, 
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these experiments used two Cdil expression plasmids. 
EcoRl-XhoI Cdil-containing fragments were introduced into 
pJG4-4 to make the plasmid pJG4-4Cdil; Cdil was 
transcribed from this plasmid as a native, unfused 
5 protein under the control of the GAL1 promoter. EcoRI- 
Xhol Cdil-containing fragments were also introduced into 
pJG4-6 to make the plasmid pJG4-6Cdil; in this case, Cdil 
was expressed as an in-frame fusion containing, at its 
amino terminus, an ATG initiation codon and the 

10 hemagglutinin epitope tag. 
library construction 

The activation- tagged yeast cDNA expression 
library was made from RNA isolated from serum grown, 
proliferating HeLa cells that yere grown on plates to 70% 

15 confluence. Total RNA was extracted as described in 
Chomczynski and Sacchi (Anal. Biochem. 162:156-159, 
1987) , and polyA + mRNA was purified on an oligodT- 
cellulose column. cDNA synthesis was performed according 
to Gubler and Hoffman (Gene 25:263-269, 1983) as modified 

20 by Huse and Hansen (Strategies 1:1-3, 1988) using a 
linker primer that contained, 5' to 3\ an 18nt polydT 
tract, an Xhol site, and a 25 nt long GA rich sequence to 
protect the Xhol site. To protect any internal Xhol 
sites, the first strand was synthesized in the presence 

25 of 5'-methyl-CTP (instead of CTP) with an RNAseH 
defective version of the Moloney virus reverse 
transcriptase (Superscript, BRL, Grand Island, NY). For 
second strand synthesis, the mRNA/cDNA hybrid was treated 
with RNAseH and E. coli DNA polymerase I, and the 

30 resulting ends were made flush by sequential treatment 
with Klenow, Mung Bean exonuclease, and Klenow onto which 
EcoRl adaptors: 

5' AATTCGGCACGAGGCG 3' (SEQ ID NO: 7) 
3' GCCGTGCTCCGC 5' (SEQ ID NO: 8) 
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were ligated, and the cDNA was digested with Xhol- This 
DNA was further purified on a Sephacryl S-400 spin column 
in order to remove excess adaptor sequences, and 
fractionated on a 5-20% KoAc gradient* Fractions 
5 containing >700 bp cDNAs were collected, and 

approximately 1/5 of the cDNA was ligated into EcoRI- and 
Xhol-digested pJG4-5. This ligation mixture was 
introduced into E. coli SURE cells by electr operation 
(Gene-Pulser, Bio-Rad, Hercules, CA) according to the 
10 manufacturer's instructions. 9.6 x 10 6 primary 

transf ormants were collected by scraping LB ampicillin 
plates. Colonies were pooled and grown in 6 liters of LB 
medium overnight (approximately three generations) , and 
plasmid DNA was purified sequeijtially by standard 
15 techniques on two CsCl gradients. Digestion of 

transf ormants of individual library members with EcoRI 
and Xhol revealed that >90% of the library members 
contained a cDNA insert whose typical size ranged between 
lkb-2kb. Western blots of individual yeast transf ormants 
20 using the anti-hemagglutinin monoclonal antibody 
suggested that between 1/4 and 1/3 of the members 
expressed fusion proteins. 
Selection of Cdc2 interactors 

Library transformation of the above-described 
25 strain was performed according to the procedure described 
by Ito et al. (J. Bacter. 153:163-168, 1983), except that 
the cells were grown to a higher OD as described in 
Schiestl and Gietz (Curr. Genet 16:339-346, 1989) and 
single stranded carrier DNA was included in the 
30 transformation mix also as described in Schiestl and 
Gietz (Curr. Genet 16:339-346, 1989). This procedure 
gave 1.2 x 10 6 primary library transf ormants (10 4 library 
transformants//ig DNA) . Transf ormants were selected on 
Glucose Ura" His" Trp" plates, scraped, suspended in 
35 approximately 20 ml of 65% glycerol, lOmM Tris-HCl pH 
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7.5, lOmM MgCl 2 , and stored in 1ml aliquots at -80°. 
Plating efficiency was determined on Galactose Ura" His" 
Trp" after growing 50/il of a cell suspension in 5 ml YP 
in the presence of 2% galactose. For screening the 
5 library, approximately 20 colony forming units on this 
medium/ original transf ormant. (about 2 X 10 7 cells) were 
plated on 4 standard circular 10cm Galactose Ura" His" 
Trp" Leu" plates after the YP/galactose induction 
described above. 

10 412 Leu* colonies appeared after a 4 day 

incubation at 30 °C. . These colonies were collected on 
Glucose Ura" His" Trp" master plates and retested on 
Glucose Ura" His" Trp" Leu", Galactose Ura" His" Trp" Leu", 
Glucose Xgal Ura" His" Trp", and Galactose Xgal Ura" His" 

15 Trp" plates. 55 of these colonies showed galactose- 
dependent growth on leu" media and galactose-dependent 
blue color on Xgal medium, and were analyzed further. 

Plasmid DNAs from these colonies were rescued as 
described (Hoffman and Winston, Gene 57:267-272, 1987), 

20 introduced into the bacterial strain KC8, and 

transf ormants were collected on Trp" ampicillin plates. 
Plasmid DNAs were analyzed and categorized by the pattern 
of restriction fragments they gave on 1.8% agarose 1/2X 
TBE gels after triple digestion with EcoRI and Xhol, and 

25 either Alul or Haelll. Characteristic plasmids from 
different restriction map classes of these cDNAs were 
retransformed into derivatives of EGY48 that expressed a 
panel of different LexA fusion proteins. Plasmids that 
carried cDNAs whose encoded proteins interacted with the 

30 LexA-Cdc2 bait but not with other LexA fusion proteins, 
including LexA-Bicoid, LexA-Fus3, LexA-Cln3, LexA-cMyc- 
Cterm, and LexA-Max were characterized further. 
Microscopy 

5ml cultures of yeast cells were grown in the 
35 appropriate complete minimal medium up OD 600 « 0.8-1 and 
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sonicated in a short burst to disrupt the clumps (Ausubel 
et al., Current Protocols i n Molecular Biology, New York, 
John Wiley & Sons, 1987). The cells were collected by 
centrifugation, washed in 1ml TE, resuspended in 1ml 70% 
5 ethanol, and shaken for 1 hour at room temperature to fix 
them, then collected and resuspended in TE* The fixed 
cells were either examined directly at lOOOx 
magnification with a Zeiss Axioscope microscope under 
Nomarski optics or by fluorescence after staining with 
10 2.5Mg/ml DAPI as described in Silver et al. (Mol. Cell. 
Biol. 6:4763-4766, 1986). 
FACS analysis 

Yeast cells were grown and fixed as described 
above and prepared for FACS analysis of DNA content 
15 essentially as in Lew et al. (Cell 63:317-328, 1992). 
After fixation the cells were collected and washed three 
times in 0.8 mis 50mM Tris/HCl pH 8.0, then 200m1 2mg/ml 
RNaseA was added and incubated at 37 °C with continuous 
shaking for 5 hours. The cells were pelleted, 
20 resuspended in 0.5 ml of 5mg/ml pepsin (freshly dissolved 
in 55mM HCl) and incubated in a 37° waterbath for 30 
minutes. The cells were spun down, washed with 1 ml of 
200mM Tris/HCl pH 7.5, 211mM NaCl, 78mM MgCl 2 and 
resuspended in the same buffer. 55m1 of 500 ng/ml 
25 propidium iodide was then added, and cells were stained 
overnight at 4°C. Typically 10,000-20,000 events were 
read and analysed in a Becton Dickinson Fluorescence 
Activated Cell Sorter (Becton Dickinson, Lincoln Park, 
NJ) with a CellFIT Cell-Cycle Analysis program Version 
30 2.01.2. 

For FACS analysis of DNA content, HeLa cells were 
grown on plates and transfected (Ausubel et al., Current 
Protocols in Molecular Biology , New York, John Wiley & 
Sons, 1987) either with pBNCdil, a DNA copy of a 
35 retroviral cloning vector (Morgenstern and Land, Nucl. 
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Acids. Res. 18:3587-3596, 1990) that directs expression 
of native Cdil under the control of the MoMuLV promoter, 
or with the vector alone. Clones of transfected cells 
were selected by growth in medium that contained 400Mg/ml 
5 of G418; Cdil expression did not diminish the number of 
G418 resistant cells recovered. Individual clones of 
each transfection (about 20) were rescued and grown on 
plates in DMEM + 10% calf serum, collected using 0.05% 
trypsin, 0.02% EDTA and washed once with IX PBS. Cells 

10 from four clones derived from the Cdil transfection and 
four from the control transfection were suspended in 
225/xl of 30 fig/nl trypsin dissolved in 3.4mM citrate, 
0.1% NP40, l.5mM spermine and 0.5mM Tris, and incubated 
on a rotator for 10 minutes at room temperature. 188m1 

15 of 0.5mg/ml of trypsin inhibitor and 0.1 mg/ml PNAse A 
was then added and the suspension was vortexed. After 
adding 188^1 of 0.4 mg/ml of propidium iodide and lmg/ml 
spermine, the samples were incubated for 30 minutes at 
4°C. FACS analysis was carried out as described above. 

20 Cdil Polypeptides and Antibodies 

Polypeptide Expression 

In general, polypeptides according to the 
invention may be produced by transformation of a suitable 
host cell with all or part of a Cdil-encoding cDNA 
25 fragment (e.g., the cDNA described above) in a suitable 
expression vehicle. 

Those skilled in the field of molecular biology 
will understand that any of a wide variety of expression 
systems may be used to provide the recombinant protein. 
3 0 The precise host cell used is not critical to the 

invention. The Cdil polypeptide may be produced in a 
prokaryotic host (e.g., E. coli) or in a eukaryotic host 
(e.g., SaccharomycBs cerevisiae or mammalian cells, e.g., 
COS 1, NIH 3T3, or HeLa cells). Such cells are available 
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from a wide range of sources (e.g., the American Type 
Culture Collection, Rockland, MD; also, see, e.g., 
Ausubel et al., Current Protocols in Molecular Biology, 
John Wiley & Sons, New York, 1989)- The method of 
5 transformation or transfection and the choice of 
expression vehicle will depend on the host system 
selected. Transformation and transfection methods are 
described, e.g., in Ausubel et al. (Current Protocols in 
Molecular Biology, John Wiley & Sons, New York, 1989); 
10 expression vehicles may be chosen from those provided, 
e.g., in Cloning Vectors: A Laboratory Manual (P.H. 
Pouwels et al., 1985, Supp. 1987). 

One preferred expression system is the mouse 3T3 
fibroblast host cell transfected with a pMAMneo 
15 expression vector (Clontech, Palo Alto, CA) . pMAMneo 

provides: an RSV-LTR enhancer linked to a dexamethasone- 
inducible MMTV-LTR promotor, an SV40 origin of 
replication which allows replication in mammalian 
systems, a selectable neomycin gene, and SV40 splicing 
20 and polyadenylation sites. DNA encoding a Cdil 

polypeptide would be inserted into the pMAMneo vector in 
an orientation designed to allow expression. The 
recombinant Cdil protein would be isolated as described 
below. Other preferable host cells which may be used in 
25 conjunction with the pMAMneo expression vehicle include 
COS cells and CHO cells (ATCC Accession Nos. CRL 1650 and 
CCL 61, respectively). 

Alternatively, a Cdil polypeptide is produced by a 
stably-transfected mammalian cell line. A number of 
30 vectors suitable for stable transfection of mammalian 
cells are available to the public, e.g., see Pouwels et 
al. (supra); methods for constructing such cell lines are 
also publicly available, e.g., in Ausubel et al. ( supra ) . 
In one example, cDNA encoding the Cdil polypeptide is 
35 cloned into an expression vector which includes the 
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dihydrofolate reductase (DHFR) gene. Integration of the 
plasmid and, therefore, the Cdil-encoding gene into the 
host cell chromosome is selected for by inclusion of 
0,01-300 (M methotrexate in the cell culture medium (as 
5 described in Ausubel et al., supra ) . This dominant 
selection can be accomplished in most cell types. 
Recombinant protein expression can be increased by DHFR- 
mediated amplification of the transfected gene. Methods 
for selecting cell lines bearing gene amplifications are 

10 described in Ausubel et al. ( supra ) : such methods 

generally involve extended culture in medium containing 
gradually increasing levels of methotrexate. 
DHFR-containing expression vectors commonly used for this 
purpose include pCVSEII-DHRF ajid pAdD26SV(A) (described 

15 in Ausubel et al*, supra). Any of the host cells 

described above or, preferably, a DHFR-def icient CHO cell 
line (e.g., CHO DHFR"cells, ATCC Accession No. CRL 9096) 
are among the host cells preferred for DHFR selection of 
a stably-transfected cell line or DHFR-mediated gene 

20 amplification. 

Once the recombinant Cdil protein is expressed, it 
is isolated, e.g., using affinity chromatography. In one 
example, an anti-Cdil antibody (e.g., produced as 
described herein) may be attached to a column and used to 

25 isolate the Cdil polypeptide. Lysis and fractionation of 
Cdil-harboring cells prior to affinity chromatography may 
be performed by standard methods (see, e.g., Ausubel et 
al., supra). Alternatively, a Cdil fusion protein, for 
example, a Cdil-maltose binding protein, a Cdil-0- 

30 galactosidase, or a Cdil-trpE fusion protein, may be 

constructed and used for isolation of Cdil protein (see, 
e.g., Ausubel et al., supra ; New England Biolabs, 
Beverly, MA) . 

Once isolated, the recombinant protein can, if 

35 desired, be further purified, e.g. , by high performance 
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liquid chromatography (see, e.g., Fisher, Laboratory 
Techniques In Biochemistry And Molecular Biology, eds., 
Work and Burdon, Elsevier, 1980) . 

Polypeptides of the invention, particularly short 
5 Cdil fragments, can also be produced by chemical 

synthesis (e.g., by the methods described in Solid Phase 
Peptide Synthesis, 2nd ed., 1984 The Pierce Chemical Co., 

Rockford, IL) . 

These general techniques of polypeptide expression 

10 and purification can also be used to produce and isolate 
useful Cdil fragments or analogs (described below) • 
ftnti-Cdil Antibodies 

Human Cdil (or immunogenic fragments or analogues) 
may be used to raise antibodiep useful in the invention; 

15 such polypeptides may be produced by recombinant or 
peptide synthetic techniques (see, e.g., Solid Phase 
Peptide Synthesis, pupra : Ausubel et al., supra ) . The 
peptides may be coupled to a carrier protein, such as KLH 
as described in Ausubel et al, supra . The KLH-peptide is 

20 mixed with Freund's adjuvant and injected into guinea 
pigs, rats, or preferably rabbits. Antibodies may be 
purified by peptide antigen affinity chromatography. 

Monoclonal antibodies may be prepared using the 
Cdil polypeptides described above and standard hybridoma 

25 technology (see, e.g., Kohler et al., Nature 256:495, 

1975; Kohler et al., Eur. J- Immunol. 6:511, 1976; Kohler 
et al., Eur. J. Immunol. 6:292, 1976; Hammerling et al., 
In Monoclonal Antibodies and T Cell Hybridomas, Elsevier, 
NY, 1981; Ausubel et al., gupra) . 

30 Once produced, polyclonal or monoclonal antibodies 

are tested for specific Cdil recognition by Western blot 
or immunoprecipitation analysis (by the methods described 
in Ausubel et al., supra). Antibodies which specifically 
recognize a Cdil polypeptide are considered to be useful 
35 in the invention; such antibodies may be used, e.g., in 
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an immunoassay to monitor the level of Cdil produced by a 
mamma 1 • 

Therapeutic and Diagnostic Uses for the Cdll Polypeptide 

Therapy 

5 The Cdil polypeptide of the invention has been 

shown to interact with a key regulator of human cell 
division and to inhibit the in vivo proliferation of 
yeast and human cells* Because of its role in the 
control of cell division, Cdil is an unusually good 

10 candidate for an anti-cancer therapeutic. Preferably, 
this therapeutic is delivered as a sense or antisense RNA 
product, for example, by expression from a retroviral 
vector delivered, for example,, to the bone marrow. 
Treatment may be combined with more traditional cancer 

15 therapies such as surgery, radiation, or other forms of 
chemotherapy . 

Alternatively, using the interaction trap system 
described herein, a large number of potential drugs may 
be easily screened, e.g., in yeast, for those which 

20 increase or decrease the interaction between Cdil and 
Cdc2. Drugs which increase Cdc2:Cdil interaction would 
increase reporter gene expression in the instant system, 
and conversely drugs which decrease Cdc2:Cdil interaction 
would decrease reporter gene expression. Such drugs are 

25 then tested in animal models for efficacy and, if 
successful, may be used as anticancer therapeutics 
according to their normal dosage and route of 
administration . 

Detection of A Malignant Condition 
30 Cdil polypeptides may also find diagnostic use in 

the detection or monitoring of cancerous conditions. In 
particular, because Cdil is involved in the control of 
cell division, a change in the level of Cdil production 
may indicate a malignant or pre-malignant condition. 
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Levels of Cdil expression may be assayed by any standard 
technique. For example, its expression in a biological 
sample (e.g., a biopsy) may be monitored by standard 
Northern blot analysis or may be aided by PGR (see, e.g., 
5 Ausubel et al., supra ; PCR Technology: Principles and 
Applications for DNA Amplification, ed., H.A. Ehrlich, 
Stockton Press, NY; and Yap and McGee, Nucl. Acids. Res. 
19:4294, 1991). These techniques are enabled by the 
provision of the Cdil sequence. 

10 Alternatively, immunoassays may be used to detect 

Cdil protein in a biological sample. Cdil-specif ic 
polyclonal, or preferably monoclonal, antibodies 
(produced as described above) may be used in amy standard 
immunoassay format (e.g., ELISA, Western blot, or RIA 

15 assay) to measure Cdil polypeptide levels; again 
comparison would be to wild type Cdil levels, and a 
change in Cdil production would be indicative of a 
malignant or pre-malignant condition. Examples of 
immunoassays are described, e.g., in Ausubel et al., 

20 supra . Immunohistochemical techniques may also be 
utilized for Cdil detection. For example, a tissue 
sample may be obtained from a patient , and a section 
stained for the presence of Cdil using an anti-Cdil 
antibody and any standard detection system (e.g., one 

25 which includes a secondary antibody conjugated to 

horseradish peroxidase) . General guidance regarding such 
techniques can be found in, e.g., Bancroft and Stevens 
(Theory and Practice of Histological Techniques, 
Churchill Livingstone, 1982) and Ausubel et al. I supra ) . 

30 in one particular example, a diagnostic method may 

be targeted toward a determination of whether the Cdil 
gene of a mammal includes the N-terminal PEST domain- 
encoding sequence. Because this sequence is very likely 
to stabilize the Cdil protein, its deletion may result in 

35 altered cellular levels of Cdil polypeptide and therefore 
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be indicative of a malignant or premalignant condition, 
PEST deletions may be identified either by standard 
nucleic acid or polypeptide analyses. 

The Cdil polypeptide is also useful for 
5 identifying that compartment of a mammalian cell where 
important cell division control functions occur* 
Antibodies specific for Cdil may be produced as described 
above. The normal subcellular location of the protein is 
then determined either jji situ or using fractionated 

10 cells by any standard immunological or 

immunohistochemical procedure (see, e.g., Ausubel et al., 
supra; Bancroft and Stevens, Theory and Practice of 
Histological Techniques, Churchill Livingstone, 1982) . 

The methods of the instant invention may be used 

15 to reduce or diagnose the disorders described herein in 
any mamma 1, for example, humans, domestic pets, or 
livestock. Where a non-human mammal is treated, the Cdil 
polypeptide or the antibody employed is preferably 
specific for that species. 

20 othftfr Tgiw hodiments 

In other embodiments, the invention includes any 
protein which is substantially homologous to human Cdil 
(Fig. 6, SEQ ID NO: 1); such homologs include other 
substantially pure naturally occurring mammalian Cdil 

25 proteins as well as allelic variations; natural mutants; 
induced mutants; proteins encoded by DNA that hybridizes 
to the Cdil sequence of Fig. 6 under high stringency 
conditions or low stringency conditions (e.g., washing at 
2X SSC at 40°C with a probe length of at least 40 

30 nucleotides) ; and polypeptides or proteins specifically 
bound by antisera directed to a Cdil polypeptide, 
especially by antisera to the active site or to the Cdc2 
binding domain of Cdil. The term also includes chimeric 
polypeptides that include a Cdil fragment. 
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The invention further includes analogs of any 
naturally occurring Cdil polypeptide. Analogs can differ 
from the naturally occurring Cdil protein by amino acid 
sequence differences , by post-translational 
5 modifications, or by both. Analogs of the invention will 
generally exhibit at least 70%, more preferably 80% f even 
more preferably 90%, and most preferably 95% or even 99%, 
homology with all or part of a naturally occurring Cdil 
sequence. The length of comparison sequences will be at 

10 least 8 amino acid residues, preferably at least 24 amino 
acid residues, and more preferably more than 35 amino 
acid residues. Modifications include in vivo and In 
vitro chemical derivatization of polypeptides, e.g., 
acetylation, carboxylation, phosphorylation, or 

15 glycosylation; such modifications may occur during 
polypeptide synthesis or processing or following 
treatment with isolated modifying enzymes. Analogs can 
also differ from the naturally occurring Cdil polypeptide 
by alterations in primary sequence. These include 

20 genetic variants, both natural and induced (for example, 
resulting from random mutagenesis by irradiation or 
exposure to ethanemethylsulf ate or by site-specific 
mutagenesis as described in Sambrook, Fritsch and 
Maniatis, Molecular Cloning: A Labor atory Manual (2d 

25 ed.), CSH Press, 1989, hereby incorporated by reference; 
or Ausubel et al., Current Protocols in Molecular 
Biology . John Wiley & Sons, 1989, hereby incorporated by 
reference) . Also included are cyclized peptides 
molecules and analogs which contain residues other than 

30 L-amino acids, e.g., D-amino acids or non-naturally 
occurring or synthetic amino acids, e.g., p or y amino 
acids. 

In addition to full-length polypeptides, the 
invention also includes Cdil polypeptide fragments. As 
35 used herein, the term "fragment", means at least 10 
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contiguous amino acids, preferably at least 30 contiguous 
amino acids, more preferably at least 50 contiguous amino 
acids, and most preferably at least 60 to 80 or more 
contiguous amino acids. Fragments of Cdil can be 
5 generated by methods known to those skilled in the art or 
may result from normal protein processing (e.g., removal 
of amino acids from the nascent polypeptide that are not 
required for biological activity or removal of amino 
acids by alternative mENA splicing or alternative protein 

10 processing events) . 

Preferable fragments or analogs according to the 
invention are those which exhibit biological activity 
(for example, the ability to interfere with mammalian 
cell division as assayed herei,n) . Preferably, a Cdil 

15 polypeptide, fragment, or analog exhibits at least 10%, 
more preferably 30%, and most preferably, 70% or more of 
the biological activity of a full length naturally 
occurring Cdil polypeptide. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION : 

(i) APPLICANT: Brent, Roger 
Gyuria, Jeno 
Golerois, Erica 

<ii) TITLE OF INVENTION: Interaction Trap System for 

Isolating Novel Proteins 

(iii) NUMBER OF SEQUENCES: 33 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Fish & Richardson 

(B) STREET: 225 Franklin Street 
<C) CITY: Boston 

(D) STATE: Massachusetts 

(E) COUNTRY: U.S.A. 

(F) ZIP: 02110-2804 

(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: 3.5" Diskette, 1.44 Mb 

(B) COMPUTER: IBM PS/2 Model 50Z or 55SX 

(C) OPERATING SYSTEM: MS-DOS (Version 5,0) 

(D) SOFTWARE: WordPerfect (Version 5.1) 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 07/969,038 
<B) FILING DATE: 10/30/92 
(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(viix) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Clark, Paul T. 

(B) REGISTRATION NUMBER: 30,162 

(C) REFERENCE / DOCKET NUMBER: 00786/143001 



<ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (617) 542-5070 

(B) TELEFAX: (617) 542-8906 

(C) TELEX: 200154 

(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER: 
(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 804 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 

GGC ACT GGT CTC GAC GTO GGG CGG CCA GOG ATG GAG CCG CCC AGT TCA 48 
Gly Thr Gly Leu Asp Val Gly Arg Pro Ala Met Glu Pro Pro Ser Ser 
15 10 15 

ATA CAA ACA AGT GAG TTT GAC TCA TCA GAT GAA GAG CCT ATT GAA GAT 96 
lie Gin Thr Ser Glu Phe Asp Ser Ser Asp Glu Glu Pro He Glu Asp 
20 25 30 

GAA CAG ACT CCA ATT CAT ATA TCA TGG CTA TCT TTG TCA CGA GTG AAT 144 
Glu Gin Thr Pro He His He Ser Trp Leu Ser Leu Ser Arg Val Asn 
35 40 45 

TCT tcT CAG TTT CTC GGT TTA TGT GCT CTT CCA GGT TGT AAA TTT AAA 192 
Cys Ser Gin Phe Leu Gly Leu Cys Ala Leu Pro Gly Cys Lys Phe Lys 
50 55 60 

GAT GTT AGA AGA AAT GTC CAA AAA GAT ACA GAA GAA CTA AAG AGC TGT 240 
Asp Val Arg Arg Asn Val Gin Lys Asp Thr Glu Glu Leu Lys Ser Cys 
65 70 75 80 

GGT ATA CAA GAC ATA TTT GTT TTC TGC ACC AGA GGG GAA CTG TCA AAA 288 
Gly He Gin Asp He Phe Val Phe Cys Thr Arg Gly Glu Leu Ser Lys 
85 90 95 

TAT AGA GTC CCA AAC CTT CTG GAT CTC TAC CAG CAA TGT <SGA ATT ATC 336 
Tyr Arg Val Pro Asn Leu Leu Asp Leu Tyr Gin Gin Cys Gly He He 
100 105 HO 

ACC CAT CAT CAT CCA ATC GCA GAT GGA GGG ACT CCT GAC ATA GCC AGC 384 
Thr His His His Pro He Ala Asp Gly Gly Thr Pro Asp He Ala Ser 
115 120 125 

TGC TGT GAA ATA ATG GAA GAG CTT ACA ACC TGC CTT AAA AAT TAC CGA 432 
Cvs Cvs Glu He Met Glu Glu Leu Thr Thr Cys Leu Lys Asn Tyr Arg 
130 135 140 

AAA ACC TTA ATA CAC TGC TAT GGA GGA CTT GGG AGA TCT TGT CTT GTA 480 
Lys Thr Leu He His Cys Tyr Gly Gly Leu Gly Arg Ser Cys Leu Val 
145 150 155 160 

GCT GCT TGT CTC CTA CTA TAC CTG TCT GAC ACA ATA TCA CCA GAG CAA 528 
Ala Ala Cys Leu Leu Leu Tyr Leu Ser Asp Thr He Ser Pro Glu Gin 
165 170 175 

GCC ATA GAC AGC CTG CGA GAC CTA AGA GGA TCC GGG GCA ATA CAG ACC 576 
Ala lie Asp Ser Leu Arg Asp Leu Arg Gly Ser Gly Ala He Gin Thr 
180 185 190 

ATC AAG CAA TAC AAT TAT CTT CAT GAG TTT CGG GAC AAA TTA GCT GCA 624 
He Lys Gin Tyr Asn Tyr Leu His Glu Phe Arg Asp Lys Leu Ala Ala 
195 200 205 

CAT CTA TCA TCA AGA GAT TCA CAA TCA AGA TCT GTA TCA AGA 666 
His Leu Ser Ser Arg Asp Ser Gin Ser Arg Ser Val Ser Arg 
210 215 220 

TAAAGGAATT CAAATAGCAT ATATATGACC ATGTCTGAAA TGTCAGTTCT CTAGCATAAT 726 

TTGTATTGAA ATGAAACCAC CAGTGTTATC AACTTGAATG TAAATGTACA TGTGCAGATA 786 
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TTCCTAAAGT TTTATTGA 



(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER: 2: 

t 

(i) SEQUENCE CHARACTERISTICS I 

(A) LENGTH: 5 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Glu Phe Pro Gly lie 
15 

(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER: 3: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 42 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS J double 
<D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

GATCCTGCTG TATATAAAAC CAGTGGTTAT ATGTACAGTA CG 42 



(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER: 4: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 42 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
GACGACATAT ATTTTGGTCA CCAATATACA TGTCATGCCT AG 



(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER: 5: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 9 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

Pro Pro Lys Lys Lye Arg Lys Val Ala 
1 5 

(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER: 6: 

(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 9 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

Tyr Pro Tyr Asp Val Pro Asp Tyr Ala 
1 5 

(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER: 7: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 

(B) types nucleic, acid 

(C) STRANDEDNESSl double 

(D) TOPOLOGY t linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
AATTCGGCAC GAGGCG 

(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER: 8s 
(i) SEQUENCE CHARACTERISTICS I 

(A) LENGTHS 12 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

GCCGTGCTCC GC 

(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER: 9s 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTHS 73 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9s 

Met Glu Asp Tyr Thr Lys He Glu Lys He Gly Glu Gly Thr Tyr Gly 
15 10 15 

Val Val Tyr Lys Gly Arg Lys Lys Thr Thr Gly Gin Val Val Ala Met 
20 25 30 

Lys Lys He Arg Leu Glu Ser Glu Glu Glu Gly Val Pro Ser Thr Ala 
35 40 45 

He Arg Glu He Ser Leu Leu Lys Glu Leu Arg His Pro Asn He Val 
50 55 60 

Ser Leu Gin Asp Val Leu Met Gin Asp 



16 



12 
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65 70 

(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER: 10: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 73 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Met Glu Asm Phe Gin Lye Val Glu Lya He Gly Glu Gly Thr Tyr Gly 
1 5 10 15 

Val Val Tyr Lys Ala Arg Asn Lys Leu Thr Gly Glu Val Val Ala Leu 
20 25 30 

Lys Lye He Arg Leu Asp Thr Glu Thr Glu Gly Val Pro Ser Thr Ala 
35 40 45 

He Arg Glu He Ser Leu Leu Lys Glu Leu Asn His Pro Asn He Val 
50 55 1 60 

Lys Leu Leu Asp Val He His Thr Glu 
65 70 

(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER: lis 
<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 82 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Met Ser Gly Glu Leu Ala Asn Tyr Lys Arg Leu Glu Lys Val Gly Glu 
15 10 15 

Glv Thr Tyr Gly Val Val Tyr Lys Ala Leu Asp Leu Arg Pro Gly Gin 
* 20 25 30 

Glv Gin Arg Val Val Ala Leu Leu Lys Lys He Arg Leu Glu Ser Glu 
35 40 45 

Asp Glu Gly Val Pro Ser Thr Ala He Arg Glu He Ser Leu Leu Lys 
50 55 60 

Glu Leu Lys Asp Asp Asn He Val Arg Leu Tyr Asp He Val His Ser 
65 70 75 80 

Asp Ala 

(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER: 12: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 73 

(B) TYPE: amino acid 
<C) STRAND EDNESS : 

(D) TOPOLOGY t linear 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Met Glu Asp Phe Glu Lys He Glu Lys He Gly Glu Gly Thr Tyr Gly 
15 10 15 

Val Val Tyr Lys Gly Arg Asn Arg Leu Thr Gly Gin He Val Ala Met 
20 25 30 

Lys Lys He Arg Leu Glu Ser Asp Asp Glu Gly Val Pro Ser Thr Ala 
35 40 45 

He Arg Glu lie Ser Leu Leu Lys Glu Leu Lys His Glu Asn He Val 
50 55 60 

Cys Leu Glu Asp Val Leu Met Glu Glu 
65 70 

(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER: 13: 
(i) SEQUENCE CHARACTERISTICS: 

<A) LENGTH: 77 

(B) TYPEt amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Met Thr Thr He Leu Asp Asn Phe Gin Arg Ala Glu Lys He Gly Glu 
1 5 10 15 

Gly Thr Tyr Gly He Val Tyr Lys Ala Arg Ser Asn Ser Thr Gly Gin 
20 25 30 

Asp Val Ala Leu Lys Lys He Arg Glu Leu Gly Glu Thr Glu Gly Val 
35 40 45 

Pro Ser Thr Ala He Arg Glu He Ser Leu Leu Lys Asn Leu Lys His 
50 55 60 

Pro Asn Val Val Gin Leu Phe Asp Val Val He Ser Gly 
65 " 70 75 

(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER: 14: 
(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 86 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
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Met Pro Lys Arg He Val Tyr Asn He Ser Ser Asp Phe Gin Leu Lys 
15 10 IB 

Ser Leu Leu Gly Glu Gly Ala Tyr Gly Val Val Cys Ser Ala Thr His 
20 25 30 

Lys Pro Thr Gly Glu He Val Ala He Lys Lys He Glu Pro Phe Asp 
35 40 45 

Lys Pro Leu Phe Ala Leu Arg Thr Leu Arg Glu He Lys He Leu Lys 
50 55 60 

His Phe Lys His Glu Asn He He Thr He Phe Asn He Gin Arg Pro 
65 70 75 80 

Asp Ser Phe Glu Asn Phe 
85 



(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 84 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY : linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 



Ser Arg Leu Tyr Leu He Phe Glu Phe Leu Ser Met Asp Leu Lys Lys 
15 10 15 

Tyr Leu Asp Ser He Pro Pro Gly Gin Tyr Met Asp Ser Ser Leu Val 
20 25 30 

Lys Ser Tyr Leu Tyr Gin He Leu Gin Gly He Val Phe Cys His Ser 
35 40 45 

Arg Arg Val Leu His Arg Asp Leu Lys Pro Gin Asn Leu Leu He Asp 
50 55 60 

Asp Lys Gly Thr He Lys Leu Ala Asp Phe Gly Leu Ala Arg Ala Phe 
65 70 75 80 

Gly lie Pro He 



(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 83 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION x SEQ ID NO: 16: 

Asn Lys Leu Tyr Leu Val Phe Glu Phe Leu His Gin Asp Leu Lys Lye 
15 10 15 

Phe Met Asp Ala Ser Ala Leu Thr Gly lie Pro Leu Pro Leu lie Lye 
20 25 30 

Ser Tyr Leu Phe Gin Leu Leu Gin Gly Leu Ala Pro Cys His Ser His 
35 40 45 

Arg Val Leu His Arg Asp Leu Lys Pro Gin Asn Leu Leu He Asn Thr 
50 55 60 

Glu Gly Ala He Lys Leu Ala Asp Phe Gly Leu Ala Arg Ala Phe Gly 
65 70 75 80 

Val Pro Val 

(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER t 17 J 
(1) SEQUENCE CHARACTERISTICS! 

(A) LENGTH: 84 

(B) TYPE i amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY : linear 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

His Lys Leu Tyr Leu Val Phe Glu Phe Leu Asp Leu Asp Leu Lys Arg 
15 10 15 

Tyr Met Glu Gly He Pro Lys Asp Gin Pro Leu Gly Ala Asp He Val 
20 25 30 

Lys Lys Phe Met Met Gin Leu Cys Lys Gly He Ala Tyr Cys His Ser 
35 40 45 

His Arg He Leu His Arg Asp Leu Lys Pro Gin Asn Leu Leu He Asn 
50 55 60 

Lvs Asp Gly Asn Leu Lys Leu Gly Asp Phe Gly Leu Ala Arg Ala Phe 
65 70 75 SO 

Gly Val Pro Leu 

(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER: 18: 
(i) SEQUENCE CHARACTERISTICS J 

(A) LENGTH: 84 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

<D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

Aan Arg lie Tyr Leu lie Phe Clu Phe Leu Ser Met Asp Leu Lys Lys 
15 10 15 

Tyr Met Asp Ser Leu Pro Val Asp Lys His Met Glu Ser Glu Leu Val 
20 25 30 

Arg Ser Tyr Leu Tyr Gin lie Thr Ser Ala He Leu Phe Cys His Arg 
35 40 45 

Arg Arg Val Leu His Arg Asp Leu Lye Pro Gin Asn Leu Leu He Asp 
50 55 60 

Lvs Ser Gly Leu He Lys Val Ala Asp Phe Gly Leu Gly Arg Ser Phe 

65 . - 70 75 80 

Gly He Pro Val 

(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER: 19: 
(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH! 82 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY t linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

Asn Asn Leu Tyr Met He Phe Glu Tyr Leu Asn Met Asp Leu Lys Lys 
15 10 15 

Leu Met Asp Lys Lys Lys Asp Val Phe Thr Pro Gin Leu He Lys Ser 
20 25 30 

Tyr Met His Gin He Leu Asp Ala Val Gly Phe Cys His Thr Asn Arg 
35 40 45 

He Leu His Arg Asp Leu Lys Pro Gin Asn Leu Leu Val Asp Thr Ala 
50 55 60 

Gly Lys He Lys Leu Ala Asp Phe Gly Leu Ala Arg He Phe Asn Val 
65 70 75 80 

Pro Met 

(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER: 20: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 86 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

Aan Glu Val Tyr He He Gin Glu Leu Met Gin Thr Asp Leu His Arg 
15 10 15 

Val He Ser Thr Gin Met Leu Ser Asp Asp His He Gin Tyr Phe He 
20 25 30 

Tyr Gin Thr Leu Arg Ala Val Lys Val Leu Glu Gly Ser Abh Val He 
35 40 45 

His Arg Asp Leu Lys Pro Ser Asn Leu Leu lie Asn Ser Asn Cys Asp 
50 55 60 

Leu Lys Val Cys Asp Phe Gly Leu Ala Arg He He Asp Glu Ser Ala 
65 70 75 80 

Ala Asp Asn Ser Glu Pro 
85 

(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER: 21: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 83 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

Arg Val Tyr Thr His Glu Val Val Thr Leu Trp Tyr Arg Ser Pro Glu 
1 5 10 15 

Val Leu Leu Gly Ser Ala Arg Tyr Ser Thr Pro Val Asp He Trp Ser 
20 25 30 

He Gly Thr He Phe Ala Glu Leu Ala Thr Lys Lys Pro Leu Phe His 
35 40 45 

Gly Abp Ser Glu He Asp Gin Leu Phe Arg He Phe Arg Ala Leu Gly 
50 55 60 

Thr Pro Asn Asn Glu Val Trp Pro Glu Val Glu Ser Leu Gin Asp Tyr 
65 70 75 80 

Lys Asn Thr 

(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER: 22: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 83 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

Ara Thr Tyr Thr His Glu Val Val Thr Leu Trp Tyr Arg Ala Pro Glu 
15 10 15 

He Leu Leu Gly Cya Lys Tyr Tyr Ser Thr Ala Val Asp He Trp Ser 
20 25 30 

Leu Gly Cya He Phe Ala Glu Met Val Thr Arg Arg Ala Leu Phe Pro 
35 40 45 

Gly Asp Ser Glu He Asp Gin Leu Phe Arg He Phe Arg Thr Leu Gly 
50 55 60 

Thr Pro Asp Glu Val Val Trp Pro Gly Val Thr ser Met Pro Asp Tyr 
65 70 75 80 

Lys Pro Ser 

(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER: 23: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 83 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

<D) topology i linear 
(xi) SEQUENCE DESCRIPTION 3 SEQ ID NO: 23: 

Arc Ala Tyr Thr His Glu He Val Thr Leu Trp Tyr Arg Ala Pro Glu 
1 5 10 15 

Val Leu Leu Gly Gly Lyo Gin Tyr Ser Thr Gly Val Asp Thr Trp Ser 
20 25 30 

He Gly Cys He Phe Ala Glu Met Cys Aan Arg Lys Pro He Phe Ser 
35 40 45 

Glv Asp Ser Glu He Asp Gin Leu Phe Lys He Phe Arg Val Leu Gly 
50 55 60 

Thr Pro Asn Glu Ala He Trp Pro Asp He Val Tyr Leu Pro Asp Phe 
65 70 75 80 

Lye Pro Ser 

(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER: 24 S 
(i) SEQUENCE CHARACTERISTICS! 

(A) LENGTH: 83 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

Arg He Tyr Thr His Glu He Val Thr Leu Trp Tyr Arg Ala Pro Glu 
15 10 15 

Val Leu Leu Gly Ser Pro Arg Tyr Ser Cys Pro Val Asp He Trp Ser 
20 25 30 

He Gly Cys He Phe Ala Glu Met Ala Thr Arg Lys Pro Leu Phe Gin 
35 40 45 

Gly Asp Ser Glu He Asp Gin Leu Phe Lys He Phe Arg Val Leu Gly 
50 55 60 

Thr Pro Asn Glu Ala He Trp Pro Asp He Val Tyr Leu Pro Asp Phe 
65 70 75 80 

Lys Pro Ser 

(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBERS 25: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 83 

(B) TYPE: amino acid 

(C) STRANDEDNBSS: 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

Arg Ala Tyr Thr His Glu Val Val Thr Leu Trp Tyr Arg Ala Pro Glu 
15 10 15 

He Leu Leu Gly Thr Lys Phe Tyr Ser Thr Gly Val Asp He Trp Ser 
20 25 30 

Leu Gly Cys He Phe Ser Glu Met He Met Arg Arg Ser Leu Phe Pro 
35 40 45 

Gly Asp Ser Glu He Asp Gin Leu Tyr Arg He Phe Arg Thr Leu Ser 
50 55 60 

Thr Pro Asp Glu Thr Asn Trp Pro Gly Val Thr Gin Leu Pro Asp Phe 
65 70 75 80 

Lys Thr Lys 

(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER: 26: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTHS 90 

(B) TYPE: amino acid 

(C) STRANDEDNBSS: 

(D) TOPOLOGY: linear 
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<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

Thr GLy Gin Gin Ser Gly Met Thr Glu Tyr Val Ala Thr Arg Trp Tyr 
15 10 15 

Arg Ala Pro Glu Val Met Leu Thr Ser Ala Lys Tyr Ser Arg Ala Met 
20 25 30 

Asp Val Trp Ser Cys Gly Cys He Leu Ala Glu Leu Phe Leu Arg Arg 
35 40 45 

Pro He Phe Pro Gly Arg Asp Tyr Arg His Gin Leu Leu Leu He Phe 
50 55 60 

Gly He He Gly Thr Pro His Ser Asp Asn Asp Leu Arg Cys He Glu 
65 70 75 80 

Ser Pro Arg Ala Arg Glu Tyr He Lys Ser 
85 90 

(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER: 27: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 57 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

Phe Pro Lys Trp Lys Pro Gly Ser Leu Ala Ser His Val Lys Asn Leu 
15 10 15 

Asp Glu Asn Gly Leu Asp Leu Leu Ser Lys Met Leu He Tyr Asp Pro 
20 25 30 

Ala Lys Arg He Ser Gly Lys Met Ala Leu Asn His Pro Tyr Phe Asn 
35 40 45 

Asp Leu Asp Asn Gin He Lys Lys Met 
50 55 

(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER: 28: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 59 

(B) TYPE: amino acid 

( C ) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

Phe Pro Lys Trp Ala Arg Gin Asp Phe Ser Lys Val Val Pro Pro Leu 
15 10 15 

Asp Glu Asp Gly He Asp Leu Leu Asp Lys Leu Leu Ala Tyr Asp Pro 
20 25 30 
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Asn Lys Arg lie Ser Ala Lys Ala Ala Leu Ala His Pro Phe Thr Gin 
35 40 45 

Asp Val Thr Lys Pro Val Pro His Leu Arg Leu 
50 55 

(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER: 29: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 57 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

Phe Pro Gin Trp Arg Arg Lys Asp Leu Ser Asn Gin Leu Lys Asn Leu 
15 10 15 

Asp Ala Asn Gly He Asp Leu He Gin Lys Met Leu He Tyr Asp Pro 
* 20 25 30 

Val His Arg He Ser Ala Lys Asp He Leu Glu His Pro Tyr Phe Asn 
35 40 45 

Gly Phe Gin Ser Gly Leu Val Arg Asn 
50 55 

(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER: 30: 
(i) SEQUENCE CHARACTERISTICS: 

<A) LENGTH: 57 

(B) TYPE: amino acid 

(C) STRANDEDNESS : 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION i SEQ ID NO: 30: 

Phe Pro Gin Trp Arg Arg Lys Asp Leu Ser Asn Gin Leu Lys Asn Leu 
15 10 15 

Asp Ala Asn Gly He Asp Leu lie Gin Lys Met Leu He Tyr Asp Pro 
20 25 30 

Val His Arg He Ser Ala Lys Asp He Leu Glu Hie Pro Tyr Phe Asn 
35 40 45 

Gly Phe Gin Ser Gly Leu Val Arg Asn 
50 55 

(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER: 31: 
(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 72 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 
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(D) TOPOLOGY : linear 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

Phe Pro Arg Trp Glu Gly Thr Asn Met Pro Gin Pro lie Thr Glu His 
15 10 15 

Glu Ala His Glu Leu lie Met Ser Met Leu Cys Tyr Asp Pro Asn Leu 
20 25 30 

Arc He Ser Ala Lys Asp Ala Leu Gin His Ala Tyr Phe Arg Asn Val 
35 40 45 

Gin His Val Asp His Val Ala Leu Pro Val Asp Pro Asn Ala Gly Ser 
50 55 60 

Ala Ser Arg Leu Thr Arg Leu Val 
65 70 

(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER: 32: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 60 

(B) TYPE: amino acid 

(C) STRANDEDNESS: 

(D) TOPOLOGY! linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

Leu Pro Met Tyr Pro Ala Ala Pro Leu Glu Lys Met Phe Pro Arg Val 
15 10 15 

Asn Pro Lys Gly He Asp Leu Leu Gin Arg Met Leu Val Phe Asp Pro 
20 25 30 

Ala Lys Arg lie Thr Ala Lys Glu Ala Leu Glu His Pro Tyr Leu Gin 
35 40 45 

Thr Tyr His Asp Pro Asn Asp Glu Pro Glu Gly Glu 
50 55 60 

(2) INFORMATION FOR SEQUENCE IDENTIFICATION NUMBER: 33: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 345 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS) double 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 

AAG CTT ATG GGT GCT CCT CCA AAA AAG AAG AGA AAG GTA GCT GGT ATC 48 
Lvs Leu Met Gly Ala Pro Pro Lys Lys Lys Arg Lys Val Ala Gly He 
1 5 10 I 5 
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AAT AAA GAT ATC GAG GAG TGC AAT GCC ATC ATT GAG CAG TTT ATC GAC 96 
Asn Lys Asp lie Glu Glu Cys Asn Ala lie lie Glu Gin Phe He Asp 
20 25 30 

TAC CTG CGC ACC GGA CAG GAG ATG CCG ATG GAA ATG GCG GAT CAG GCG 144 
Tvr Leu Arq Thr Gly Gin Glu Met Pro Met Glu Met Ala Asp Gin Ala 
35 40 45 

ATT AAC GTG GTG CCG GGC ATG ACG CCG AAA ACC ATT CTT CAC GCC GGG 192 
He Asn Val Val Pro Gly Met Thr Pro Lys Thr He Leu His Ala Gly 
50 55 60 

CCG CCG ATC CAG CCT GAC TGG CTG AAA TCG AAT GGT TTT CAT GAA ATT 240 
Pro Pro He Gin Pro Asp Trp Leu Lys Ser Asn Gly Phe His Glu He 
65 70 75 80 

GAA GOG GAT GTT AAC GAT ACC AGC CTC TTG CTG AGT GGA GAT GCC TCC 288 
Glu Ala Asp Val Asn Asp Thr Ser Leu Leu Leu Ser Gly Asp Ala Ser 
85 90 95 

TAC CCT TAT GAT GTG CCA GAT TAT GCC TCT CCC GAA TTC GGC CGA CTC 336 
Tyr Pro Tyr Asp Val Pro Asp Tyr Ala Ser Pro Glu Phe Gly Arg Leu 
100 105 110 

GAG AAG CTT * 345 

Glu Lys Leu 
115 
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Claims 

1. A method for determining whether a first 
protein is capable of physically interacting with a 
second protein, comprising: 

(a) providing a host cell which contains 

5 (i) a reporter gene operably linked to a 

protein binding site; 

(ii) a first fusion gene which expresses a 
first fusion protein, said first fusion protein 
comprising said first protein covalently bonded to a 

10 binding moiety which is capable of specifically binding 
to said protein binding site; and 

(iii) a second fusion gene which expresses a 
second fusion protein, said second fusion protein 
comprising said second protein covalently bonded to a 

15 weak gene activating moiety; and 

(b) measuring expression of said reporter gene as 
a measure of an interaction between said first and said 
second proteins. 

2. The method of claim 1, further comprising * 
20 isolating the gene encoding said second protein. 

3. The method of claim 1, wherein said weak gene 
activating moiety is of lesser activation potential than 
GAL4 activation region II. 

4. The method of claim 3, wherein said weak gene 
25 activating moiety is the B42 activation domain. 

5. The method of claim 1, wherein said host cell 
is a yeast cell. 

6. The method of claim 1, wherein said reporter 
gene comprises the LEU2 gene or the lacZ gene. 
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7. The method of claim 1, wherein said host cell 
further contains a second reporter gene operably linked 
to said protein binding site. 

8. The method of claim 1, wherein said protein 
5 binding site is a LexA binding site and said binding 

moiety comprises a LexA DNA binding domain. 

9. The method of claim 1, wherein said second 
protein is a protein involved in the control of 
eukaryotic cell division. 

10 10. The method of claim 9, wherein said cell 

division control protein is enpoded by a Cdc2 gene. 

11. A substantially pure preparation of Cdil 
polypeptide. 

12. The polypeptide of claim 11, comprising an 
15 amino acid sequence substantially identical to the amino 

acid sequence shown in Figure 6 (SEQ ID NO: 1) . 

13. Purified DNA comprising a sequence encoding a 
polypeptide of claims 11 or 12. 

14. The purified DNA of claim 13, wherein said 
20 DNA is cDNA. 

15. The purified DNA of claim 11, wherein said 
DNA encodes a human Cdil polypeptide. 

16. A vector comprising the purified DNA of claim 

15. 
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17. A cell containing the purified DNA of claim 

15. 

18. A method of producing a recombinant Cdil 
polypeptide comprising, 

5 providing a cell transformed with DNA encoding a 

Cdil polypeptide positioned for expression in said cell; 

culturing said transformed cell under conditions 
for expressing said DNA; and 

isolating said recombinant Cdil polypeptide. 

10 19. A purified antibody which binds specifically 

to a polypeptide of claims 11 pr 12. 

20. A method of detecting a malignant cell in a 
biological sample, said method comprising measuring Cdil 
gene expression in said sample, a change in Cdil 
15 expression relative to a wild-type sample being 
indicative of the presence of said malignant cell. 
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molecules released from the beads (while the beads are 
retained) enabling the compounds to diffuse into the 
surrounding medium. The effects, such as plaques with a 
bacterial lawn, can be observed. Zones of growth 
5 inhibition or growth activation or effects on gene 
expression can then be visualized and the beads at the 
center of the zone picked and analyzed. 

One assay scheme will involve gels where the molecule or 
10 system, e.g. cell, to be acted upon may be embedded 
substantially homogeneously in the gel. Various gelling 
agents may be used such as polyacrylamide, agarose, 
gelatin, etc. The particles may then be spread over the 
gel so as to have sufficient separation between the 
15 particles to allow for individual detection. If the 
desired product is to have hydrolytic activity, a 
substrate is present in the gel which would provide a 
fluorescent product. One would then screen the gel for 
fluorescence and mechanically select the particles 
20 associated with the fluorescent signal. 

One could have cells embedded in the gel, in effect 
creating a cellular lawn. The particles would be spread 
out as indicated above. Of course, one could place a grid 

2 5 over the gel defining areas of one or no particle. If 

cytotoxicity were the criterion, one could release the 
product, incubate for a sufficient time, followed by 
spreading a vital dye over the gel. Those cells which 
absorbed the dye or did not absorb the dye could then be 

3 0 distinguished. 

As indicated above, cells can be genetically engineered so 
as to indicate when a signal has been transduced. There 
are many receptors for which the genes are known whose 
35 expression is activated. By inserting an exogenous gene 
into a site where the gene is under the transcriptional 
control of the promoter responsive to such receptor, an 
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enzyme can be produced which provides a detectable signal, 
e.g. a fluorescent signal. The particle associated with 
the fluorescent cell(s) may then be analyzed for its 
reaction history. 

5 

Libraries and Kits 

For convenience, libraries and/ or kits may be provided. 
The libraries would comprise the particles to which a 
library of products and tags have been added so as to 
10 allow for screening of the products bound to the bead or 
the libraries would comprise the products removed from 
the bead and grouped singly or in a set of 10 to 100 to 
1000 members for screening. The kits would provide 
various reagents for use as tags in carrying out the 
15 library syntheses. The kits will usually have at least 4, 
usually 5, different compounds in separate containers, 
more usually at least 10, and may comprise at least 10 2 
different separated organic compounds, usually not more 
than about 10 2 , more usually not more than about 36 
20 different compounds. For binary determinations, the mode 
of detection will usually be common to the compounds 
associated with the analysis, so that there may be a 
common chromophore, a common atom for detection, etc. 
Where each of the identifiers is pre-prepared , each will 
25 be characterized by having a distinguishable composition 
encoding choice and stage which can be determined by a 
physical measurement and including groups or all of the 
compounds sharing at least one common functionality. 

Alternatively, the kit can provide reactants which can be 
combined to provide the various identifiers. In this 
situation, the kit will comprise a plurality of separated 
first functional, frequently bifunctional, organic 
compounds, usually four or more, generally one for each 
stage of the synthesis, where the functional organic 
compounds share the same functionality and are 
distinguishable as to at least one determinable 



30 



35 
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characteristic. In addition, one would have at least one, 
usually at least two, second organic compounds capable of 
reacting with a functionality of the functional organic 
compounds and capable of forming mixtures which are 
5 distinguishable as to the amount of each of said second 
organic compounds. For example, one could have a glycol, 
amino acid, or a glycolic acid, where the various 
bifunctional compounds are distinguished by the number of 
fluorine or chlorine atoms present, to define stage , and 

10 have an iodomethane, where one iodomethane has no 
radioisotope, another has U C and another has one or more 
3 H. By using two or more of the iodomethanes, one could 
provide a variety of mixtures which could be determined by 
their radioemissions. Alternatively, one could have a 

15 plurality of second organic compounds, which could be used 
in a binary code. 

As indicated previously one could react the tags after 
release with a molecule which allows for detection. In 

20 this way the tags could be quite simple, having the same 
functionality for linking to the particle as to the 
detectable moiety. For example, by being linked to a 
hydroxycarboxyl group, an hydroxy 1 group would be 
released, which could then be esterified or etherified 

25 with the molecule which allows for detection. For 
example, by using combinations of fluoro- and chloroalkyl 
groups, in the binary mode, the number of fluoro and/or 
chloro groups could determine choice, while the number of 
carbon atoms would indicate stage. 

30 

Groups of compounds of particular interest include linkers 
joined to a substituted ortho-nitrobenzyloxy group, 
indanyloxy or fluorenyloxy group, or other group which 
allows for photolytic or other selective cleavage. The 
35 linking group may be an alkylene group of from 2 to 20 
carbon atoms, polyalkyleneoxy, particularly alkyleneoxy of 
from 2 to 3 carbon atoms, cycloalkyl group of from 4 to 8 
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carbon atoms, haloalkyl group, particularly fluoroalkyl of 
from 2 to 20 carbon atoms, one or more aromatic rings and 
the like, where the linker provides for the discrimination 
between the various groups, by having different numbers of 
5 units and/ or substituents . 

Individual particles or a plurality of particles could be 
provided as articles of commerce, particularly where the 
particle (s) have shown a characteristic of interest. 

10 Based on the associated tags, the reaction history may be 
decoded. The product may then be produced in a large 
synthesis. Where the reaction history unequivocally 
defines the structure, the same or analogous reaction 
series may be used to produce the product in a large 

15 batch. Where the reaction history does not unambiguously 
define the structure, one would repeat the reaction 
history in a large batch and use the resulting product for 
structural analysis. In some instances it may be found 
that the reaction series of th* combinatorial chemistry 

20 may not be the preferred way to produce the product in 
large amounts. 

Thus, an embodiment of this invention is a kit comprising 
a plurality of separated organic compounds, each of the 

25 compounds characterized by having a distinguishable 
composition, encoding at least one bit of different 
information which can be determined by a physical 
measurement, and sharing at least one common 
functionality. A preferred embodiment is a kit comprising 

3 0 at least 4 different functional organic compounds. 

More preferred is a kit wherein said functional organic 
compounds are of the formula: 

F 1 -F 2 -C-E-C' I 
35 where F^-T 2 is a linker which allows for attachment to and 
detachment from a solid particle; and 
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C-E-C' is a tag member which can be determined by a 
physical measurement, especially wherein said functional 
organic compounds differ by the number of methylene groups 
and/or halogens, nitrogens or sulfurs present. 

5 

Also preferred is a kit wherein the C-E-C' portion is 
removed photochemically or a kit wherein the C-E-C' 
portion is removed oxidatively, hydrolytically , 
thennolytically, or reductively. 

10 

Compounds of this invention may be useful as analgesics 
and/or for the treatment of inflammatory disease, 
especially in the case of the azotricyclics acting as 
antagonists of the meurokin 1/brandykin receptor. Members 
15 of the benzodiazepine library may be useful as a muscle 
relaxant and/or tranquilizer and/or as a sedative. 
Members of the 23 million Mixed Amide Library may be of 
use in the treatment of hypertension on endothelin 
antagonists or Raynaud's syndrcme. 

20 

The following examples are offered by way of illustration 
and not by way limitation. 

In one embodiment the invention is composition comprising 
25 at least 6 different components, each component having a 
distinguishable moiety. The components may be 

characterized by each moiety being substantially 
chemically stable or inert and having an identifiable 
characteristic different from each of the other moieties. 
30 Each moiety is joined to a linking group having an active 
functionality capable of forming a covalent bond, through 
a linking group to individually separable solid surfaces, 
or joined to a group which is detectable at less than 1 
nanomole. With a proviso that when the moieties are 
35 joined to the linking group, the components are physically 
segregated. Preferably, the solid supports are beads. 
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In one embodiment each component comprises molecules of 
different compounds bound to individual separable solid 
surfaces, wherein the molecules on the solid surfaces. 
Preferably, the moieties of the invention define an 
5 homologous series and/or a series of substitutions on a 
core molecule. 

The invention herein is also directed to a compound 
library comprising at least one hundred unique solid 

10 supports. In this compound library each solid support has 
(1) an individual compound bound to the solid support as 
a major compound bound to the support; and (2) a plurality 
of tags e.g. tags incapable of being sequenced, where the 
tags are individual tag molecules which are physically 

15 distinguishable in being physically separable and are 
substituted so as to be detectable at less than about a 
nanomole or have a functional group for bonding to a 
substituent which is detectable at less than about at 
nanomole. Preferably, in the compound library each solid 

20 support has at least about 6 tags. In another embodiment, 
in the compound library the tags define a binary code 
encoding the synthetic protocol used for the synthesizing 
the compound on the solid support. 

25 This invention also provides a method for determining a 
synthetic protocol encoded by separable physically 
different tags in a series and defining a binary code. In 
this method at least two tags are employed to define each 
stage of the synthetic protocol, there being at least six 

30 tags. The step of the method comprising separating tags 
by means of their physical differences and detecting the 
tags to define a binary line encoding the protocol whereby 
the synthetic protocol is determined in accordance with 
the binary line. 

35 

compound of this invention may be useful as analgesics 
and/or for the treatment of inflammatory disease, 
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especially in the case of the azatricyclics acting as 
antagonists of the neurokinin 1/brandykin receptor. 
Members of the benzodiazepine library may be useful as a 
muscle relaxant and/or tranquilizer and/ or as a sedative. 
5 Members of the 23 Mixed Amide Library may be of use in the 
treatment of hypertension on endothelin antagonists or 
Raynaud's syndrome. 

EXAMPLE 1 

10 PEPTIDE LIBRARY 

In order to encode up to 10 9 different syntheses, one could 
prepare 30 different identifiers which carry individual 
tags capable of being separated one from another by 
capillary GC. For encoding a smaller number of syntheses, 

15 fewer identifiers would be used. The tags would normally 
be prepared from commercially-available chemicals as 
evidenced by the following illustration. 
w-Hydroxyalkenes-l, where the number of methylene groups 
would vary from 1 to 5, wculd be reacted with an 

20 iodoperfluoroalkane, where the number of CF 2 groups would 
be 3, 4, 6, 8, 10, and 12. By employing a free-radical 
catalyst, the iodoperf luorocarbon would add to the double 
bond, where the iodo group could then be reduced with 
hydrogen and a catalyst or a tin hydride. In this manner, 

25 30 different tags could be prepared. The chemical 
procedure is described by Haszeldine and Steele, J. Chem. 
Soc. (London), 1199 (1953); Brace, J. Fluor. Chem., 20, 
313 (1982). The highly fluorinated tags can be easily 
detected by electron capture, have different GC retention 

30 times, so that they are readily separated by capillary GC, 
are chemically inert due to their fluorinated, hydrocarbon 
structure and each bears a single hydroxyl functional 
group for direct or indirect attachment to particles. 

35 Before attachment to compound precursors, the tags 
(referred to as T1-T30) would be activated in a way which 
is appropriate for the chemical intermediates to be used 
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in the combinatorial synthesis. By appropriate it is 
intended that a functionality would be added which allows 
for ready attachment by a chemical bond to a compound 
precursor or to the bead matrix itself. The activation 
5 process would be applied to each of the 3 0 different tags 
and allow these tags to be chemically bound, either 
directly or indirectly, to intermediates in the 
combinatorial compound synthesis. For example, a carboxy 
derivative could be used for coupling and upon activation 
10 the resulting carboxy group would bond to the particle. 

In the case of a combinatorial synthesis of a peptidic 
compound or other structure made of amide-linked organic 
fragments, the encoding process could consist of addition 
15 of a carboxylic acid-equipped linker. For example, the 
tag would be coupled to the tert. -butyl ester of o-nitro- 
E-carboxybenzyl bromide in the presence of sodium hydride. 
The ester would then be hydrolyzed in dilute 
trifluoroacetic acid. 

20 

Activated identifiers would be coupled to intermediates at 
each stage in the combinatorial compound synthesis. The 
ortho-nitrobenzyl ether part of the activated identifiers 
is used to allow photochemical detachment of the tags 
25 after completing the combinatorial synthesis and selecting 
the most desirable compounds. The detached tags would 
then be decoded using capillary GC with electron capture 
detection to yield a history of the synthetic stages used 
to prepare the compound selected. 

30 

While there is an almost unlimited set of chemical stages 
and methods which could be used to prepare combinatorial 
libraries of compounds, we will use coupling of a-amino 
acids to make a combinatorial library of peptides as an 
35 example of an application of the encoding methodology. In 
this example, we will describe preparation of a library of 
pentapeptides having all combinations of 16 different 
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amino acids at each of the five residue positions. Such 
a library would contain 16 s members. To uniquely encode 
all members of this library, 20 detachable tags (T1-T20) 
as described above would be required. 

5 

To prepare the encoded library, we would begin with a 
large number (>10 6 ) of polymer beads of the type used for 
Merrifield solid phase synthesis and functional ized by 
free amino groups. We would divide the beads into 16 

10 equal portions and place a portion in each of 16 different 
reaction vessels (one vessel for each different a-amino 
acid to be added) . We would then add a small portion 
(e.g., 1 mol%) of identifiers to each of the amino acid 
derivatives (e.g., Fmoc amino acids) to be coupled in the 

15 first stage of the combinatorial synthesis. The specific 
combination of the tags incorporated into the identifiers 
added would represent a simple binary code which 
identifies the amino acid used in the first stage of 
synthesis. The 16 amino acids added would be indicated by 

20 numbers 1-16 and any such number could be represented 
chemically by combinations of the first four tags (T1-T4) . 
In tables 2 and 3, a typical encoding scheme is shown in 
which the presence or absence of a tag is indicated by a 
1 or a 0, respectively. The letter T may represent 

25 either the the tag or the identifier incorporating that 
tag. 
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Table 2. A typical encoding scheme. 

5 



Amino Acid added in first 
stage 


T4 


T3 


T2 


Tl 


Number 1 (e.g. , glycine 


0 


0 


0 


0 


Number 2 (e.g., alanine) 


0 


0 ' 


0 


1 


Number 3 (e.g., valine) 


0 


0 


1 


0 


Number 4 (e.g., serine) 


0 


0 


1 


1 


Number 5 (e.g., threonine) 


0 


1 


0 


0 






















Number 16 (e.g., tryptophan) 


1 


1 


1 


1 



We would then carry out a standard dicyclohexyl- 
carbodiimide (DCC) peptide coupling in each of the 16 
20 vessels using the Fmoc amino acids admixed with small 
amounts of the encoding activated identifiers as indicated 
above. During the couplings, the amino acids as well as 
small amounts (e.g., 1%) of the identifiers would become 
chemically bound to intermediates attached to the beads. 

25 

Next the beads would be thoroughly mixed and again 
separated into 16 portions. Each portion would again be 
placed in a different reaction vessel. A second amino 
acid admixed with appropriate new activated identifiers 
30 (T5-T8) would be added to each vessel and DCC coupling 
would be carried out as before. The particular mixture of 
the incorporated tags (T5-T8) would again represent a 
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simple binary code for the amino acid added in this, the 
second stage of the combinatorial synthesis. 

5 Table 3. A typical encoding scheme. 



15 



Amino Acid added in second 
stage 


T8 


T7 


T6 


T5 


Number 1 (e.g., glycine 


0 


0 


0 


0 


Number 2 (e.q., alanine) 


0 


0 


0 


1 


Number 3 (e.g., valine) 


0 


0 


1 


0 


Number 4 (e.g., serine) 


0 


0 


1 


1 


Number 5 (e.g., threonine) 


0 


.1 


0 


0 






















Number 16 (e.q., tryptophan) 


1 


1 


1 


1 



20 After the 16 couplings of stage 2 are complete, the beads 
would be again mixed and then divided into 16 new portions 
for the third stage of the synthesis. For the third 
stage, T9-T12 would be used to encode the third amino acid 
bound to the beads using the same scheme used for stages 

25 l and 2. After the third couplings, the procedure would 
be repeated two more times using the fourth amino acids 
with T13-T16 and the fifth amino acids with T17-T20 to 
give the entire library of 1,048,576 different peptides 
bound to beads. 

30 

Although the above beads would be visually 
indistinguishable, any bead may be chosen (e.g., by 
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selecting based on the interesting chemical or biological 
properties of its bound peptide or other target molecule) 
and its synthetic history may be learned by detaching and 
decoding the associated tags. 

5 

The precise method used to detach tags will depend upon 
the particular linker used to chemically bind it to 
intermediates in the combinatorial synthesis of the target 
compound. In the example above, the ortho-nitrobenzyl 

10 carbonate linkages, which are known to be unstable to 
-300 nm light (Ohtsuka, et al., J. Am. Chem. Soc. , 100, 
8210 [1978]), would be cleaved by photochemical 
irradiation of the beads. The tags would then diffuse 
from the beads into free solution which would be injected 

15 into a capillary gas chromatograph (GC) equipped with a 
sensitive electron capture detector. Since the order in 
which the tags (T1-T20) emerged from the GC and their 
retention times under standard conditions were previously 
determined, the presence or absence of any of T1-T20 would 

20 be directly determined by the presence or absence of their 
peaks in the GC chromatogram. If 1 and 0 represent the 
presence and absence respectively of peaks corresponding 
to T1-T20, then the chromatogram can be taken as a 
20-digit binary number which can uniquely represent each 

25 possible synthesis leading to each member of the peptide 
library. The use of halocarbon tags which are safe, 
economical and detectable at subpicomole levels by 
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electron capture detection makes this capillary GC method 
a particularly convenient encoding scheme for the purpose. 

As an example of using the encoding scheme for the 
5 pentapeptide library above, a particular bead is 
irradiated with light to detach the tags, the solubilized 
labels injected ..into a capillary GC and the following 
chromatogram obtained ("Peak" line): 



10 



20 



Label 20 19 18 17 16 15 H 13 12 11 10 9 8 7 6 5 A 3 2 1 GC Inject 

Peak I I I I I I I II • 

15 Binary 1111 0100 0011 0001 0010 

Stage 5 4 3 2 1 

AA Tryptophan Threonine Serine Alanine Valine 



The "Label" line diagrams the GC chromatogram where T20-T1 
peaks (|) are to be found (note the injection is given on 
the right and the chromatogram reads from right to left) . 

25 The "Peak" line represents the presence of labels (T20-T1) 
as peaks in the chromatogram. The "Binary" line gives 
presence (1) or absence (0) of peaks as a binary number. 
The "Stage" line breaks up the binary number into the five 
different parts encoding the five different stages in the 

30 synthesis. Finally, the "AA" line gives the identity of 
the amino acid which was added in each stage and was given 
by the binary code in the "Binary" line above. 



35 
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EXAMPLE 2 

RADIO-LABELED TAGS 
In the next illustration, the tags employed are 
monomethyl ethers of linear alkyl-a, o-diols. The diol 
5 would have N + 2 carbon atoms, where N designates the 
stage. The methyl group would be a radiolabeled reagent 
which would have any of a variety of 3 H/ K C ratios from 1/1 
to m/1, where m is the number of choices. The double 
radiolabel allows for accurate quantitation of the tritium 

10 present in the tag. By having 10 different alkylene 
groups and 10 different radioactive label ratios, 10™ 
unique ten-member sets of tags are generated. Tags would 
be attached by first reacting them with activating agents, 
e.g. phosgene to form a chloroformate, followed by 

15 reaction with the F 1_ F 2 component. In this case, F 1 -F 2 is 
the o-nitro-p-carboxy-benzyl alcohol protected as the t- 
butyl ester. Each time a synthetic stage is carried out, 
the de-esterified identifier is added directly to the 
bead, which has covalently bonded amine or hydroxyl 

20 groups, to form amides or esters with the acid activated 
using standard chemistry, e.g., carbodiimide coupling 
methodology. At the end of the sequential synthesis, the 
beads are then screened with a variety of receptors or 
enzymes to determine a particular characteristic. The 

25 beads demonstrating the characteristic may then be 
isolated, the tags detached and separated by HPLC to give 
a series of glycol monomethyl ethers which may then be 
analyzed for radioactivity by standard radioisotope 



WO 94/08051 



PCT/US93/09345 



-74- 

identif ication methods. For example, if the first and 
second tags to elute from the HPLC column had 3 H/ K C ratios 
of 5:1 and 7:1 respectively, then the product which showed 
activity had been synthesized by reagent number 5 in 
5 stage 1 and reagent number 7 in stage 2 . 

EXAMPLE 3 

2401 Peptide Library 
The identifiers employed were 2-nitro-4-carboxybenzyl, 
O-aryl substituted u-hydroxyalkyl carbonate, where alkyl 
was of from three to 12 carbon atoms and aryl was (A) 
pentachlorophenyl, (B) 2 , 4 , 6-trichlorophenyl , or (C) 2,6- 
dichloro-4-f luorophenyl . The tags are designated as NAr, 
wherein N is the number of methylene groups minus two and 
Ar is the aryl group. Thus, tag 2A has a butylene group 
bonded to the pentachlorophenyl through oxygen. The 
subject tags can be easily detected using electron capture 
gas chromatography at about 100 fmol. 

20 In tfce subject analysis, the tagging molecules are 
arranged in their GC elution order. Thus the tag which is 
retained the longest on the GC column is designated Tl and 
is associated with the least significant bit in the binary 
synthesis code number, the next longest retained tag is 

25 called T2 representing the next least significant binary 
bit, and so on. Using an 0.2mM x 20M methylsilicone 
capillary GC column, eighteen well-resolved tags were 
obtained where Tl through T18 corresponded to 10A, 9A, 8A, 



10 



15 
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7A, 6A, 5A, 4A, 3A, 6B, 2A f 5B, 1A, 4B, 3B, 2B, IB, 2C, 
and IC, respectively. 



An encoded combinatorial library of 2401 peptides was 
5 prepared. This library had the amino acid sequence N- 
XXXXEEDLGGGG-bead, where the variable X residues were D, 
E, I, K, L, Q, or S (single letter code). The 4 glycines 
served as a spacer between the encoded amino acid sequence 
and the bead. The combinatorial library included the 

10 sequence H 2 N-KLISEEDL, part of the 10 amino acid epitope 
which is known to be bound by 9E10, a monoclonal antibody 
directed against the human C-myc gene product. For 
encoding this library, three binary bits were sufficient 
to represent the seven alternative reagents for each 

15 stage. The code was as follows: 001 = S; 010 = I; 011 = 
K; 100 = L; 101 = Q; 110 = E; 111 = D. 

The library was synthesized by first preparing the 
constant segment of the library lyJEEDLGGGG-bead on 1.5 g 

20 of 50-90/x polystyrene synthesis beads functionalized with 
1.1 meq/g of aminomethyl groups using standard solid phase 
methods based on t. -butyl side-chain protection and Fmoc 
main chain protection (Stewart and Young, "Solid Phase 
Peptide Synthesis", 2nd edition, Pierce Chemical Co., 

25 1984). After deprotecting the Fmoc groups with 
diethylamine, the beads were divided into seven 200 mg 
fractions and each fraction placed in a different 
Merrifield synthesis vessel mounted on a single wrist- 
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action shaker- The beads in the seven vessels were 
processed independently as follows (see Table 3-1). The 
letter T in this example refers to the tag or to the 
identifier incorporating that tag. 

5 

TABLE 3-1 



Ves 
sel 
No. 


Step 1 


Step 2 


step 3 


Step 4 


1 


1%T1 


DIG, wash 


Fmoc(tBu)S, Anh. 


Wash 


2 


1%T2 


ti 


FmocI, Anh. 


it 


3 


l%Tl f T2 


it 


Fmoc(Boc)K, Anh. 


ti 


4 


1%T3 


it 


FmocL, Anh. 


ii 


5 


1%T1,T3 


■i 


Fmoc(trityl)Q, 
Anh. 


it 


6 


1%T2,T3 


it 


Fmoc(t-butyl)E, 
Anh. 


ti 


7 


1%T1,T2,T3 


ii 


Fmoc(tBu)D, Anh. 


ii 



In accordance with the above procedure a sufficient amount 
20 of the identifiers listed in step 1 were attached via 
their carboxylic acids using diisopropylcarbodiimide to 
tag about 1% of the free amino groups on each bead in the 
corresponding vessel. The remaining free amino groups on 
each bead were then coupled in step 3 to N-protected amino 
25 acid anhydrides. After washing with methylene -chloride, 
isopropanol, and N,N-dimethylforrnamide, the beads from the 
seven vessels were combined and thoroughly mixed. At this 
point the library had seven members. 
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After Fmoc deprotection (diethylamine) , the beads were 
again divided into seven vessels and processed as before 
except that in place of the identifiers used previously, 
identifiers representing the second stage (T4-6) were 
5 used. By repeating the procedure two more times, using 
identifiers T7-9 and then T10-12 analogously, the entire 
uniquely encoded library of 7*=2401 different peptides was 
prepared using only 12 identifiers - 



10 



15 



20 



To read the synthesis code from a single selected bead, 
the bead was first washed four times in a small centrifuge 
tube with 100 portions of DMF, and then resuspended in 
1 nL of DMF in a Pyrex capillary tube. After 2 hrs of 
photolysis with a Rayonet 350 ran light source, the tags 
released from the bound identifiers were silylated using 
about 0.1 fiL bis-trimethylsilylacetamide and the solution 
injected into a Hewlett Packard capillary gas 
chromatograph equipped with an 0.2mM x 20M methylsilicone 
fused silica capillary column and an electron capture 
detector. The binary synthesis code of the selected bead 
was directly determined from the chromatogram of the tags 
which resulted. 
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EXAMPLE A 

Benzodiazepine Library 
A combinatorial benzodiazepine library comprising 3 0 
compounds of the formula VIII 

5 



CI 




VIII 



wherein: 

R is CH 3/ CH(CH 3 ) 2f CH 2 C0 2 H, <CH 2 ) 4 NH 2 , CH 2 C 6 H<OH, or CH 2 C 6 H 5 
and 

15 R 1 is H, CHj, C 2 H 5r CH 2 CH=CH2, or CHjC^ 

is constructed per the following scheme. 
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0 HNFmoc 




TFA STEP B 
DClt 



0 INFmDC 




STEP C 
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STEP C 




0 HNFmoc 



© = POLYSTYRENE RESIN 



5TEP D 



1) TAGS Kg_ c 

2) 20* PIPERIDINE IN DMF 

B * 

x 

FmocN CO 
H T 



0 




3) 5 AcOH/DMF 
o 

60 C 
STEP E 
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STEP E 




STEP F 



1) LITHIATED 5(EHEHYLMETHYL)-2- 
OXAZOLIDIHOHE 

TBF, -78 C 

2) R 1 !, DMF 

X=BKOHIHE OS IODINE 

3) TFA:H 2 0:DIMETHYLSULFIDE 

95:5:iD 



■d-i 




STEP G 



hv (350 nil) 
W 
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The benzodiazepines VIII are constructed on polystyrene 
beads similarly to the method of Bunin and Ellman (JACS, 
^ 14t 10997-10998 [1992]) except that a photolabile linker 
5 is incorporated between the bead and the benzodiazepine 
(see steps A, B, and C) , thus allowing the benzodiazepine 
to be removed in step G non-hydrolytically by exposure to 
U.V. light (350 nm in DMF for 10 minutes to 12 hr) . 
Additionally, binary codes are introduced in steps D and 
10 E which allow for a precise determination of the reaction 
sequence used to introduce each of the 6 R's and 5 R^s. 
After removal of the tags according to step H and analysis 
by electron capture detection following GC separation, the 
nature of the individual R and R 1 groups is determined. 



15 



20 



25 



Steps D, E, and F essentially follow the procedure of 
Bunin and Ellman, but also include the incorporation of 
identifiers IXa-c in step D and IXd-f in Step E. The 
identifiers are all represented by Formula IX, 




IX 
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wherein: 

IX a indicates n=6; 
IX b indicates n=5; 
IX C indicates n=4 ; 
5 IX d indicates n=3 ; 

IX e indicates n=2; and 
IX f indicates n=l. 

The codes for each of R and R 1 are as follows: 
o Table 4-1 



15 



20 



IX 


R 


a 


CH 3 


b 


CH(CH 3 ) 2 


a,b 


CHjCOgH 


c 


(CH^NHj 


a,c 


CH 2 -C 6 H 4 -4-OH 


b,c 


CH 2 C 6 H 5 


IX 


El 


d 


H 


e 


CHj 


d,e 


C2 H 5 


f 


CH 2 CH=CH 2 


d,f 


CH 2 C 6 H 5 
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Step A 

To a solution of I (1 equiv) in toluene (cone. = 0*5 M) is 
added the Fmoc protected 2-amino-5-chloro-4 '-hydroxy- 
benzophenone (1.3 eq)and diethylazaodicarboxylate (1.3 eq) 
5 and triphenylphosphine (1.3 eq) . The mixture is stirred 
at room temperature for 24 hr. The solvent is removed in 
vacuo and the residue triturated with ether and filtered 
and the solvent again removed in vacuo. The resultant 
product II is purified by chromatography on silica gel. 

10 

Step B 

To a solution of II in DCM (0.2 M) stirring at r.t. is 
added TFA (3 equiv.) and the solution is allowed to stir 
for 12 hr. The solution is evaporated to dryness in vacuo 
15 and the residue dissolved in DCM, washed once with brine 
and dried (Na 2 S0 4 ) . Filtration and evaporation of the 
solvent affords III. 

Step C 

20 1% DVB (divinylbenzene) cross-linked polystyrene beads 
(SOix) functional ized with aminomethyl groups (1.1 mEq/g) 
are suspended in DMF in a peptide reaction vessel 
(Merrifield vessel). Ill (2 equiv) and HOBt (3 equiv) in 
DMF are added and the vessel shaken for 10 min. DIC (3 eq) 

25 is added and the vessel is shaken until a negative 
Ninhydrin test indicates completion of the reaction after 
12 hr. 
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The DMF is removed and the resin washed with additional 
DMF <x5) and DCM (x5) before drying j.n vacuo . 

5 Step D 

The dry resin is divided into 6 reaction vessels and is 
suspended in DCM. The appropriate combinations of 
identifiers IX a . e (see Table 4-1) are added to the flasks 
and shaken for 1 hr. The Rh(TFA) 2 catalyst (1 mol%) is 

10 added to each flask and shaken for an additional 2 hr. 
The flasks are drained and the resin washed with DCM (x5) . 
The resin is then treated with a solution of TFA in DCM 
(0.01 M) and shaken for 30 rain, and then washed again with 
DCM (x3) followed by DMF (x2) . The resin is treated with 

15 a 20% solution of piperidine in DMF and shaken for 30 min. 
and is then washed with DMF (x3) and DCM (x3). 

To each flask is added the appropriate Fmoc protected 
amino acylfluoride (3 equiv) (when required side-chain 

20 functional groups are protected as tert-butyl ester (Asp) , 
tert-butyl ether (Tyr) or £ert-butyloxycarbonyl (Lys)) 
with 2 . 6-di- tert -butyl-4-methylpyridine (10 equiv) and the 
flasks shaken overnight or until a negative Ninhydrin test 
is achieved. The resin is washed once (DCM) and then the 

25 six batches are combined and washed again (DCM, x5) before 
drying in vacuo . 
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Step E 

The dry resin is divided into five reaction vessels and is 
suspended in DCM. The appropriate combinations of 
identifiers IX d . f (see Table 4-1) are added to the flasks 
5 and shaken for 1 hr. The Rh(TFA) 2 catalyst (1 mol%) is 
added to each flask and shaken for an additional 2 hr. 
The flasks are drained and the resin washed with DCM (x5) . 
The resin in then treated with a solution of TFA in DCM 
(0.01 M) and shaken for 30 min. and is then washed with 
10 DMF (X3) and DCM (X3) . 

To each flask is added a solution of 5% acetic acid in DMF 
and the mixtures are heated to 60 'C and shaken overnight. 
The solvent is drained and then the resin washed with DMF 
15 (x5). 

Step F 

Each batch of resin is suspended in THF and the flasks are 
cooled to -78 # C. To each flask is added a solution of 

20 lithiated 5- (phenylmethyl) -2-oxazolidinone (2 equiv) in 
THF and the mixtures are shaken at -78 'C for 1 hr. The 
appropriate alkylating agent (Table 4-2) (4 equiv) is then 
added to each reaction flask followed by a catalytic 
amount of DMF. The vessels are allowed to warm to ambient 

25 temperature and shaken at this temperature for 5 hrs. The 
solvent is removed by filtration and the resin washed with 
THF (xl) and then dried in vacuo . The batches of resin 
are then combined and washed with THF (x2) and DCM (x2) 
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and the combined resin is then treated with a 95:5:10 
mixture of TFA: water: dimethyl sulphide for 2 hrs to remove 
the side chain protecting groups. 



5 



TABLE 


4-2 


IDENTIFIER 


ALKYLATING 




AGENT 


e 


H 3 CI 




C 2 H 5 Br 


f 


BrCH2-CH=CH 2 


d,f 


BrCH 2 C 6 H 5 



10 

Step G 

The resultant benzodiazepine can be cleaved from a bead of 
polystyrene by suspending the bead in DMF and irradiating 
with U.V. (350 nm) for 12 hrs. 

15 

Step H 

A bead of interest is placed into a glass capillary tube. 
Into the tube is syringed 1 ilL of 1M aqueous cerium (IV) 
ammonium nitrate (CAN) solution, 1 ML of acetonitrile and 
20 2/iL of hexane. The tube is flame sealed and then 
centrifuged to ensure that the bead is immersed in the 
reagents. The tube is placed in an ultrasonic bath and 
sonicated from 1 to 10 hrs preferably from 2 to 6 hrs. 
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The tube is cracked open and rl mL of the upper hexane 
layer is mixed with -0.2 nL of bis(trimethylsilyl) - 
acetamide (BSA) prior to injection into the GC and each 
tag member determined using electron capture detection, as 
5 exemplified in the following scheme. 
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EXAMPLE 5 

J 17 ,649 Peptide Library 
An encoded library of 117,649 peptides was prepared. This 
library had the sequence H 2 N-XXXXXXEEDLGGGG-bead, where the 
5 variable residue X was D,E,I,K,L,Q or S. This library was 
encoded using the 18 tags as defined in Example 3; three 
binary bits being sufficient to represent the seven amino 
acids used in each step. The code was: 001=S; 010=1; 
011=K; 100=L; 101=Q; 110=E; and lll=D r where 1 indicates 
10 the presence and 0 indicates the absence of a tag. 

The constant segment of the library (HjNEEDLGGGG-bead) was 
synthesized on 1.5 g of 50-80 p Merrifield polystyrene 
synthesis beads functionalized with 1.1 mEq/g of 

15 aminomethyl groups using standard solid phase methods 
based on t-Bu sidechain protection and Fmoc mainchain 
protection. After deprotecting the N-terminal Fmoc 
protecting group with diethylamine, the beads were divided 
into seven 200 mg portions, each portion being placed into 

20 a different .Merrifield synthesis vessel mounted on a 
single wrist-action shaker. 

The beads in the seven vessels were processed as in Table 
3-1 to attach the sets of identifiers (T1-T3) and the 
25 corresponding amino acid to each portion except that 
instead of DIC, i-butylchlorof ormate was used for 
activation. 



WO 94/08051 



-92- 



PCT/US93/09345 



This procedure first chemically attached small amounts of 
appropriate identifiers via their carboxylic acids to the 
synthesis beads. This attachment was achieved by 
activating the linker carboxyl groups as mixed carbonic 
5 anhydrides using iso butylchlorof ormate , and then adding an 
amount of activated identifier corresponding to 1% of the 
free amino groups attached to the beads. Thus, about 1% 
of the free amino groups were terminated for each 
identifier added. The remaining free amino groups were 
10 then coupled in the usual way with the corresponding 
protected amino acids activated as their symmetrical 
anhydrides. 



After washing, the seven portions were combined and the 
15 Fmoc protected amino groups were deprotected by treatment 
with diethylamine. The beads were again divided into 
seven portions and processed as before, except that the 
appropriate identifiers carrying tags T4, T5, and T6 were 
added to the reaction vessels. 
20 The procedure of dividing, labelling, coupling the amino 
acid combining and main-chain deprotection was carried out 
a total of six times using identifiers bearing tags Tl- 
T18, affording an encoded peptide library of 117,649 
different members. 



25 
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Tvpical Identifier Prep aration 

To a solution of 8-bromo-l-octanol (0.91 g, 4-35 mmol) and 
2,4,6-trichlorophenol (1.03 g, 5.22 mmol) in DMF (5 mL) 
was added cesium carbonate (1.70 g, 5.22 mmol) resulting 
5 in the evolution of gas and the precipitation of a white 
solid. The reaction was stirred at 80' C for 2 hrs. The 
mixture was diluted with toluene (50 mL) and poured into 
a separatory funnel, washed with 0.5 N NaOH (2x50 mL) , IN 
HC1 (2x50 mL) and water (50 mL) and the organic phase was 
10 dried (MgS0 4 ) . Removal of the solvent by evaporation gave 
1.24 g (87% yield) of tag as a clear oil. 



The above tag (0.81 g, 2.5 mmol) was added to a 2 M 
solution of phosgene in toluene (15 mL) and stirred at 

15 room temperature for 1 hr. The excess phosgene and the 
toluene were removed by evaporation and the resulting 
crude chlorof ormate was dissolved in DCM (5 mL) and 
pyridine (0.61 mL, 7.5 mmol). tert-Butyl 4-hydroxy- 
methyl -3 -nitrobenzoate (Barany and Albericio, J. Am. Chem. 

20 Soc, (1985), 107, 4936-4942) (0.5 g, 1.98 mmol) was added 
and the reaction mixture stirred at room temperature for 
3 hrs. The solution was diluted with ethyl acetate (75 
mL) and poured into a separatory funnel. After washing 
with IN HC1 (3x35 mL) , saturated NaHC0 3 (2x35 mL) and brine 

25 (35 mL) , the organic phase was dried (MgS0 4 ) . The solvent 
was removed by evaporation and the residue purified by 
chromatography on silica gel (5% to 7.5% ethyl acetate in 
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petroleum ether) affording 0.95 g (79% yield) of the 
identifier tert-butyl ester as a clear oil. 



Trifluoroacetic acid (3 mL) was added to a solution of the 
5 identifier tert-butyl ester (0.95 g, 1-57 mmol) in DCM (30 
mL) to deprotect the linker acid (i.e., F^F 2 of Formula I) 
and the solution was stirred at room temperature for 7 
hrs. The mixture was then evaporated to dryness and the 
residue redissolved in DCM (30 mL) . The solution was 
10 washed with brine (20 mL) and the organic phase dried 
(MgS0 4 ) . Removal of the solvent by evaporation gave 0.75 
g (87% yield) of the identifier (6B) as a pale yellow 
solid. (Tag nomenclature is the same as in Example 3). 

15 Typical Encoded Library Synthesis Step 

Na-Fmoc-E(tBu)-E(tBu)-D(tBu)-L-G4-NH-resin was suspended 
in DMF (20 mL) and shaken for 2 min. After filtering, 1:1 
diethylamine:DMF (40 mL) was added to remove the Fmoc 
protecting groups and the resin was shaken for 1 hr. The 

20 resin was separated by filtration and washed with DMF 
(2x20 mL, 2 min each); 2:1 dioxane: water (2x20 mL, 5 min 
each), DMF (3x20 mL, 2 min each), DCM (3 x 20 mL, 2 min 
each) then dried in vacuo at 25 • C. (The resin was found 
to have 0.4 mmol/g amino groups by picric acid titration 

25 at this stage.) 

150 mg Portions of the resin were placed into seven 
Merrifield vessels and suspended in DCM (5 mL) . The 
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appropriate identifiers were activated as their acyl 
carbonates as follows (for the first coupling): Tl (6.6 
mg, 0.0098 mmol) was dissolved in anhydrous ether (2 mL) 
and pyridine (10 pL) was added. Isobutyl chlorof ormate 
5 (1*3 /iL, 0.0096 mmol) was added as a solution in anhydrous 
ether (0.1 mL) . The resulting mixture was stirred at 25° 
c for 1 hr. during which time a fine white precipitate 
formed. The stirring was stopped and the precipitate was 
allowed to settle for 3 0 min. Solutions of the 

10 acylcarbonates of T2 and T3 were prepared in the same way. 
Aliguots (0.25 mL) of the supernatant solution of 
activated identifiers were mixed to give the appropriate 
3-bit binary tag codes and the appropriate coding mixtures 
of identifiers were added to each of the seven synthesis 

15 vessels. The vessels were shaken in the dark for 12 hrs, 
and then each was washed with DCM (4x10 mL, 2 min each) . 
A solution of the symmetrical anhydride of an Nor-Fmoc 
amino acid in DCM (3 equivalents in 10 mL) was then added 
to the corresponding coded batch of resin and shaken for 

20 20 min. 5% N, N-diisopropylethylamine in DCM (1 mL) was 
added and the mixture shaken until the resin gave a 
negative Kaiser test. 

The resin batches were filtered and combined, and then 
25 washed with DCM (4x20 mL, 2 min each) , isopropanol (2x20 
mL, 2 min each), DCM (4x20 mL, 2 min each). The next 
cycle of labelling/coupling was initiated by Fmoc 
deprotection as described above. 
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After Fmoc deprotection of the residues in the last 
position of the peptide, the side chain functionality was 
deprotected by suspending the resin in DCM (10 mL) , adding 
thioanisole (2 mL) , ethanedithiol (0.5 mL) and tri- 
5 fluoroacetic acid (10 mL) then shaking for 1 hr at 25 m C. 
The resin was then washed with DCM (6x20 mL, 2 min each) 
and dried. 

Electron Capture Gas Chromatography Reading of Code 

10 A single, selected bead was placed in a Pyrex capillary 
tube and washed with DMF (5x10 /aL) - The bead was then 
suspended in DMF (1 mL) and the capillary was sealed. The 
suspended bead was irradiated at 366 nm for 3 hrs to 
release the tag alcohols, and the capillary tube 

15 subsequently placed in a sand bath at 90* C for 2 hrs. 
The tube was opened and bis-trimethylsilyl acetamide (0.1 
mL) was added to trimethylsilylate the tag alcohols. 
After centrifuging for 2 min., the tag solution above the 
bead (1 /xL) was injected directly into an electron capture 

20 detection, capillary gas chromatograph for analysis. Gas 
chromatography was performed using a Hewlett Packard 
Series II Model 5890 gas chromatograph equipped with a 0.2 
mmx20 m methylsilicone fused silica capillary column and 
an electron capture detector. Photolysis reactions were 

25 performed using a UVP "Black Ray" model UVL 56 hand-held 
366 nm lamp. 
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Antibody Affinity M ethods 

The anti-C-myc peptide monoclonal antibody 9E10 was 
prepared from ascites fluid as described in Evans et aL, 
Mol. Cell Biol., 5, 3610-3616 (1985) and Munro and Pelham, 
5 Cell, 48, 899-907 (1987), To test beads for binding to 
9E10, beads were incubated in TBST [20 nff Tris-HCl (pH 
7.5), 500 mM NaCl and 0.05% Tween-20] containing 1% bovine 
serum albumin (BSA) to block non-specific protein binding 
sites. The beads were then centrifuged, resuspended in a 

10 1:200 dilution of 9E10 ascites fluid in TBST + 1% BSA and 
incubated overnight at 4°C. Beads were subsequently 
washed three times in TBST and incubated for 90 min. at 
room temperature in alkaline phosphatase-coupled goat 
antimouse IgG antibodies (Bio-Rad Laboratories) , diluted 

15 1:3000 in TBST + 1% BSA. After washing the beads twice in 
TBST and once in phosphatase buffer (100 mM Tris-HCl, pH 
9.5, 100 mM NaCl and 5 mM MgCl 2 ) , the beads were incubated 
1 hr at room temperature in phosphatase buffer containing 
one one-hundreth part each of AP Color Reagents A & B 

20 (Bio-Rad Laboratories) . To stop the reaction, the beads 
were washed twice in 20 mM sodium EDTA, pH 7.4. Solution 
phase affinities between 9E10 and various peptides were 
determined by a modification of the competitive ELISA 
assay described by Harlow et al. , Antibodies: a Laboratory 

25 Manual, 570-573, Cold Spring Harbor Press, Cold Spring 
Harbor, N. Y. 
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From a 30 mg sample of the combinatorial library of 
peptides, 40 individual beads were identified which 
stained on exposure to the anti-Omyc monoclonal antibody. 
Decoding of these positive-reacting beads established the 
5 ligand's reaction sequence as the myc epitope (EQKLISEEDL) 
or sequences that differed by one or two substituents 
among the three N-terminal residues. 

EXAMPLE 6 

10 23.540.625 Mixed Amide Library 

The encoding technique was tested further by the 
preparation of a combinatorial library of 23,540,625 
members consisting of peptides and other amide compounds . 



15 The synthesis was carried out using 15 different reagents 
in 5 steps and- 31 different reagents in the sixth step. 
Four identifiers were used to encode each of the 5 steps 
with 15 reagents and five identifiers were used in the 
final step with 31 reagents. A label set of 25 

20 identifiers was therefore prepared. 2-Nitro-4- 
carboxybenzyl, O-aryl substituted w-hydroxyalkyl carbonate 
identifiers were employed, where the tag components were 
comprised of an alkyl moiety of from 3 to 12 carbon atoms 
and the aryl moieties were (A) pentachlorophenyl , (B) 

25 2,4,5-trichlorophenyl, (C) 2 , 4 , 6-trichlorophenyl , or (D) 
2, 6-dichloro-4-fluorophenyl. A set of 25 tags was 
prepared using appropriate alkyl chains lengths with A, B, 
C or D, separable using a 0.2 mMx25M methylsilicone GC 
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column. The chemical compositions of tags T1-T25 (where 
Tl represents the tag with the longest retention time, and 
T25 the tag with the shortest retention time) are 



summarized below: 



5 



Tl 


10A 


T6 


IOC 


Til 


7B 


T16 


5C 


T21 


2B 


T2 


9A 


T7 


9B 


T12 


7C 


T17 


4B 


T22 


2C 


T3 


8A 


T8 


9C 


T13 


6B 


T18 


4C 


T23 


IB 


T4 


7A 


T9 


8B 


T14 


6C 


T19 


3B 


T24 


1C 


T5 


10B 


T10 


8C 


T15 


5B 


T20 


3C 


T25 


2D 



The designations 10A, 9A, etc, are as described in Example 
3. 

The fifteen reagents used in the first five stages and the 
15 code identifying them are represented below where 1 
represents the presence of tag and 0 the absence thereof. 
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5 



10 



REAGENT 


CODE 


L-serine 


(0001) 


D-serine 


(0010) 


L-glutamic acid 


(0011) 


D-glutamic acid 


(0100) 


L-glutamine 


(0101) 


D-glutamine 


(0110) 


L-lysine 


(0111) 


D- lysine 


(1000) 


L-Proline 


(1001) 


D- Proline 


(1010) 


L-phenylalanine 


(1011) 


D-phenylalanine 


(1100) 


3-amino-benzoic 
acid 


(1101) 


4-aininophenyl 
acetic acid 


(1110) 


3 , 5-diamino- 
benzoic acid 


(1111) 



20 
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The 31 reagents and the code representing them in the 
ixth stage are represented below: 



REAGENT 


CODE 


L-serine 


(00001) 


D-serine 


(00010) 


L-glutamic acid 


(00011) 


D-glutamic acid 


(00100) 


L-glutamine 


(00101) 


D-glutamine 


(00110) 


L-lysine 


(00111) 


D-lys ine 


(01000) 


L-proline 


(01001) 


D-proline 


(01010) 


L-ohenvl a 1 anine 


(01011) 


D-phenylalanine 


(01100) 


3 -amino-benzoic acid 


(01101) 


4-aminophenyl acetic acid 


(OHIO) 


3 , 5-diamino-benzoic acid 


(01111) 


Succinic Anhydride 


(10000) 


Tiglic acid 


(10001) 


2-pyrazine carboxylic acid 


(10010) 
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(±)thioctic acid 


(10011) 


1-piperidinepropionic acid 


(10100) 


piperonylic acid 


(10101) 


6-methylnicotinic acid 


(10110) 


3- (2-thienyl) acrylic acid 


(10111) 


methyl iodide 


(11000) 


tosyl chloride 


(11001) 


p-toluenesulfonyl isocyanate 


(11010) 


3-cyanobenzoic acid 


(11011) 


phthallic anhydride 


(11100) 


acetic anhydride 


(11101) 


ethyl chloroformate 


(11110) 


mesylchloride 


(11111) 



15 A spacer of six glycine units was prepared on the 

beads using standard methods. The variable region was 
constructed using butyl sidechain protection, and amino 
groups were protected as Fraoc derivatives. Amide bonds 
were formed by activation of the carboxylic acid with DIC 

20 and HOBt. 
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EXAMPLE 7 

HAi-gro-Diels-Al der Library 
A combinatorial hetero Diels-Alder library comprising 42 
compounds of the formula: 

5 

2 




io H 
wherein; 

R 1 is H, CHjO, F 3 C, FjCO, H 5 C 6 0, or C^; 
R 2 is H, CH 3 , or CH3O; 

R 3 is H (when n=2), or CH 3 (when n=l) ; and 




WO 94/08051 



PCT/US93/09345 



-104- 

was constructed per the following scheme: 




0 



STEP C 
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IV 



1) Toluene 

A 

2) Identifiers^ 




R 

STEP D 




R 



0 2 N V 



R E 
1) Identifiers x, 




2 ) r o 



R 



BF 3 . Et 2 0 
DCIi 



STEP E 
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The azatricyclic products (VI) were constructed on 
polystyrene beads and were linked to the beads by a 
photocleavable linker allowing the azatricycle (VII) to be 
5 removed from the bead by exposure to U.V. light (350 nm in 
DMF) • The binary codes introduced in steps C,D and E 
allow a unique determination of the reaction sequence used 
to introduce ArR, R 1 , R z and R 3 . The encoding tags were 
removed according to step G and analyzed by electron 
10 capture detection following GC separation. 

The identifiers used in this scheme are represented by the 
formula X: 



CI 



15 




Wherein; 
25 X a indicates n=10 
X fa indicates n=9 
X c indicates n=8 
X d indicates n=7 
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X fi indicates n=6 
X f indicates n=5 
X g indicates n=4 



5 The codes for each of R, R 1 , R 2 , R 3 are as follows: 

TABLE 7-1 



10 



Ar = 




R = H 



15 b 



Ar = 



R = CI 



20 a,b 



R 1 =H R 2 =H 



25 d 



R 1 =H R 2 =CH, 



d,c 



R^CHj R 2 =OCH 3 
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R 1 =CF 3 R 2 =H 



e,c 



R 1 =C 6 H 5 0 R 2 =H 



5 e,d 



R 1 =F 3 CO R 2 =H 



e, d, c 



R 1 =C 6 H n R 2 =H 



10 



R 3 =CH 3 n=l 



R 3 =H 



n=2 



Step A 

15 To a solution of I (2.03 g, 8 mmol) , 4-hydroxybenzaldehyde 
(1.17 g, 9.6 ramol) and triphenylphosphine (2.73 g, 10.4 
mmol) in toluene (20 rah) stirring at 0°C was added over a 
period of 30 minutes diethylazodicarboxylate. The 
solution was allowed to warm and stirred for 1 hour once 

20 ambient temperature had been reached. The solution was 
concentrated by removal of approximately half of the 
solvent in vacuo and was then triturated with ether. The 
mixture was then filtered and the residue was washed 
thoroughly with ether. The solvent was removed in vacuo 

25 and the residue was purified by chromatography on silica 
gel (15% ethyl acetate in hexane) affording 1.3 g of the 
ether Ha (47% yield) . 
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2 -chloro- 4 -hydroxybenz aldehyde and 2 -hydroxy- 1- 
naphthaldehyde were coupled to I in analogous fashion 
affording ethers lib and c in yields of 91% and 67%, 
respectively. 

5 

Step B 

To a solution of ether Ila (0.407 g, 1.14 amdl) in DCM (20 
mL) stirring at room temperature was added TFA (8 mL) . 
The solution was allowed to stir for 6 hrs. The solution 
10 was evaporated to dryness in vacuo affording 0.343 g of 
acid Ilia (100% yield) . Ethers lib and lie were 
deprotected analogously affording acids Illb and c in 
yields of 92% and 100% respectively. 

15 Step C 

Into a peptide reaction vessel (Merrifield vessel) were 
measured 1% DVB (divinylbenzene) cross-linked polystyrene 
beads (50-80/x) funct ionali zed with aminomethyl groups (1.1 
meq/g) (200 mg of resin) . The resin was suspended in DMF 

2 0 (2 mL) and shaken for 20 min. The acid Ilia (38 mg, 2 
equiv.), 1-hydroxybenzotriazole (40 mg, 2 equiv) and 
diisopropylcarbodiimide (38 mg, 2 equiv) were added and 
the mixture shaken until a negative Ninhydrin test was 
achieved (22 hr) . The solution was removed by filtration 

25 and the resin was washed with DCM (8x 10 mL) . 

The resin was resuspended in DCM (5 mL) , identifier Xa (15 
mg) was added and the flask was shaken for 1 hr. Rh(TFA) 2 
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catalyst (1 mol%) was added and the flasks shaken for 2 
hrs. The solvent was removed by filtration and the resin 
resuspended in DCM (5 mL) . Trif luoroacetic acid (1 drop) 
was added and the vessel shaken for 20 min. The solvent 
5 was removed by filtration, and the resin was washed with 
DCM (8x 10 mL) . 

In an analogous fashion, acids Illb and IIIc were attached 
to the resin and were encoded with the appropriate 
10 identifiers, i.e., Xb for acid Illb and Xa and Xb for acid 
IIIc. The three batches of resin were combined, mixed, 
washed, and dried. 

Step D 

15 The dry resin was divided into 7 equal portions (87 mg) 
which were put into seven peptide reaction vessels 
(Merrifield vessels) which were wrapped with heat tape. 
The resin in each vessel was suspended in toluene (10 mL) 
and shaken for 20 min. An appropriate amount of one 

20 aniline was then added to each flask (see Table 7-2) . 
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TABLE 7-2 



X LAO J\ 


ANILINE 


AMOUNT ADDED 


1 


Aniline 


3 mL 


2 


3 , 5-dimethylaniline 


3 mL 


3 


3 , 4 , 5-trimethoxyaniline 


2 g 


4 


4-trifluoromethylaniline 


3 mL 


5 


4 -phenoxyanil ine 


2 g 


6 


4 - tr i f luoromethoxyani 1 ine 


3 mL 


7 


4-cyclohexylaniline 


2 g 



10 

The heating tape was connected and the reaction 
mixtures shaken at 70 *C for 18 hrs. The heat tape was 
disconnected and the solvent was removed by filtration and 
each batch of resin was washed with dry DCM (4x 10 mL) , 

15 ether (10 mL) , toluene (10 mL) and DCM (2x 10 mL) . Each 
of the portions was then suspended in DCM (5 mL) and to 
each flask was added the appropriate identifier or 
combination of identifiers (Xc-e) (15 mg) (see Table 7-1) . 
The flasks were shaken for 1 hr. and then Rh(TFA) 2 (1 mol%) 

20 was added to each flask and shaking continued for 2 hrs. 

The solvent was then removed and each batch of resin was 
re-suspended in DCM (5 mL) to which was added TFA (1 
drop). This mixture was shaken for 20 min., then the 
25 solvent was removed by filtration. The batches of resin 
were then washed (DCM, lx 10 mL) and combined, washed 
again with DCM (3x 10 mL) and then dried thoroughly in 
vacuo , 

30 Step E 

The dried resin was divided into two equal portions (0.3 
g) and each was placed in a peptide reaction vessel. The 
resin batches were washed with DCM (2x 10 mL) and then 
resuspended in DCM (5 mL) . To one flask was added the 
35 identifier Xf (15 mg) and to the other was added Xg (15 
mg) . The flasks were shaken for 1 hr. prior to the 
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addition of Rh (TFA) 2 catalyst (1 mol%) . The flasks were 
shaken for 2 hrs. and then the solvent was removed by 
filtration. Each batch of resin was washed with DCM (3x 
10 mL) , and each was then resuspended in DCM (5 mL) . 

5 

The appropriate enol ether (1 mL) (see Table 7-1) was added 
to the flasks and the vessels shaken for 30 min. To each 
flask was added a solution of BF 3 «OEt 2 (0,5 mL of a 5% 
solution in DCM) and the flasks were shaken for 24 hrs. 
10 Removal of the solvent by filtration was followed by 
washing of the resin with DCM (10 mL) and the resin was 
then combined- The beads were then washed further with 
DCM (5x 10 mL) , DMF (2x 10 mL) methanol (2x 10 mL) and DCM 
(2x 10 mL) . The resin was then dried thoroughly in vacuo, 

15 

Step F 

To confirm the identity of the products produced in the 
Hetero-Diels-Alder library one example was completed on a 
large scale to allow confirmation of the structure by 

20 spectroscopic means. The procedure followed was 
essentially the same method as described for the 
combinatorial library. In step A 4-hydroxybenzaldehyde 
was coupled to the photolabile group. In step D, aniline 
was condensed with the aldehyde. In step E, the enol 

25 ether was formed with 4 , 5-dihydro-2-methylfuran. 

The photolysis of the compound (step F) was performed by 
suspending 100 mg of the beads in DMF (0.3 mL) and 
irradiating the beads with UVP "Black Ray" model UVL 56 

30 hand -held 366 nm lamp for 16 hrs. The DMF was removed to 
one side by pipette and the beads rinsed with additional 
DMF (2x 3 mL) . The original solution and the washings 
were combined and the solvent removed in vacuo . NMR 
analysis of the reaction mixture showed it to contain the 

35 desired azatricycle by comparison to the authentic sample. 
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Step G 

A bead of interest was placed into a pyrex glass capillary 
tube sealed at one end. A solution (1 ML) of 1M aqueous 
5 cerium (IV) ammonium nitrate and acetonitrile (1:1) was 
syringed into the tube, and the tube was then centrifuged 
so that the bead lay on the bottom of the capillary and 
was completely immersed by the reagent solution, Hexane 
(2 mL) was added by syringe and the tube was again 

10 centrifuged. The open end of the capillary was flame- 
sealed and placed in an ultrasonic bath for 4 hrs. The 
capillary was then placed inverted into a centrifuge and 
spun such that the aqueous layer was forced through the 
hexane layer to the bottom of the tube. This extraction 

15 process was repeated 3 or 4 times and the tube was then 
opened. The hexane layer (1.5 M L ) was removed by syringe 
and placed into a different capillary containing BSA (0,2 
/iL) . This tube was sealed and centrifuged until the 
reagents were thoroughly mixed. A portion of the solution 

20 (ca. 1 uL) was removed and injected into a gas 
chromatography machine with a 25M x 0.2 mM methyl silicone 
fused silica column with electron capture detection for 
separation and interpretation of the tag molecules. 

25 The sample was injected onto the GC column at 200*C and 25 
psi of carrier gas (He^ . After 1 minute the temperature 
was increased at a rate of 20 *C per minute to 320 *C, and 
the pressure was increased at a rate of 2 psi per minute 
to 40 psi. These conditions are shown in the following 

30 diagram: 
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GC CONDITIONS 



TEMPERATURE 




320 C 



1 Din 




25 The following results were obtained with four randomly 
selected beads: 







Bead 1 








TAG DETECTED 






Xf 


Xe Xd Xc 


Xb Xa 


Ar 






2 -Hydroxy naphthyl 


R 1 








R 2 




H 




R 3 


CH, (n=l) 
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Bead 2 



5 





TAG DETECTED 




Xg 


Xe Xd Xc 


Xb 


Ar 






2-chloro-4-hydroxyphenyl 


R 1 








R 2 




H 




R 3 


H (n=2) 







10 

Bead 3 





TAG DETECTED 




Xg 


Xe \Xd 


Xb Xa 


Ar 






2 -Hydroxy naphthyl 


R 1 




F.CO 




R 2 




H 




R 3 


H (n=2) 







20 

Bead 4 







TAG DETECTED 






Xf 


Xe Xd 


Xb 


Ar 






2 -chloro-4 -hydroxyphenyl 


R 1 




F,CO 




R 2 




H 




R 3 


CH, (n=l) 
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E SAMPLE 8 



Benzodiazepine Library 



10 



15 



20 



25 



Following the procedure of Example 4, a combinatorial 
library is constructed of the Formula X 



wherein 

R is a radical of a naturally occurring D or L amino acid; 
R 1 is H, C r C 6 alkyl, lower alkenyl, C,-^ alkylamine, 
carboxy C r C 6 alkyl, * or phenyl C r c 6 alkyl wherein the 
phenyl is optionally substituted by lower alkyl, F, Cl, 
Br, OH f NH 2 , C0 2 H, or O-lower alkyl; 
R 2 is H or C0 2 H; 
R 3 is H or OH; 
R A is H or CI; 

with the provisos that when R 3 is OH, R 2 is H and when R 2 
is carboxy, R 3 is H. 



2 



R 
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10 



This library is released from a plurality of encoded beads 
of the general formula 



\S) F F 




R 



15 wherein 

IX n is a plurality of identifiers of the Formula la wherein 

said plurality represents an encoded scheme ; 
S is a substrate; 

F 1/ -F 2 is the residue of the linker member of Formula la; 
20 and 

R, R 1 , R 2 , and R 4 are as defined for Formula X, 
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EXAMPLE 9 

T ypical Identifier Preparations 
The diazo compound identifiers which are attached to the 
resin via carbene formation are prepared as exemplified. 

5 

Compounds of the general formula 



10 




0 



15 

wherein 

n is 0-10 and 

Ar is pentachlorophenol , 2, 4 , 6-trichlorophenol, 
20 2,4, 5-trichlorophenol, or 2 , 6-dichloro-4-f luorophenol 

are prepared as follows. 

To a solution of l-hydroxy-4- (2 , 6-dichloro-4-fluoro- 
phenoxy) butane (0.38 g, 1.5 mmol), methyl isovanillate 

25 (0.228 g, 1.5 mraol) and triphenylphosphine (0.393 g, 1.5 
mmol) in THF (8 mL) was added diethylazodicarboxylate 
(0.287 g, 1.7 mmol). The solution stirred at r.t. for 36 
hrs. The solvent was removed in vacuo and the residue 
purified by chromatography on silica gel (with a mixture 

3 0 of 20% ethyl acetate and 80% petroleum ether) affording 
0.45 g of the aldehyde (77% yield). 

The aldehyde (100 mg, 0.26 mmol) was dissolved in acetone 
(8 mL) and was treated with a solution of KMn0 4 (61 mg r 
35 0.39 mmol) in acetone (4 mL) and water (4 mL) . The 
reaction stirred at room temperature for 13 hrs. The 
mixture was diluted with ethyl acetate (100 mL) and water 
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(50 mL) and the layers were separated. The aqueous layer 
was extracted with additional ethyl acetate (2x 100 mL) . 
The combined organic layers were washed with water (50 mL) 
and dried (MgSOJ . Removal of the solvent afforded 109 mg 
5 of the benzoic acid (93% yield) . 

A solution of the acid (76 mg, 0.188 mmol) in methylene 
chloride (2 mL) was treated with oxalylchloride (36 mg, 
0.28 mmol) and catalytic DMF. After stirring for 10 min 

10 at room temperature slow but steady evolution of gas was 
observed. Stirring continued for 2 hrs. when the solution 
was diluted with DCM (15 mL) and washed with saturated 
aqueous sodium hydrogencarbonate solution (5 mL) . The 
layers were separated. The organic layer was dried 

15 (Na 2 S0 4 ) and the solvent evaporated affording the benzoyl 
chloride as pale yellow crystals. 

The benzoyl chloride was dissolved in methylene chloride 
(5 mL) and was added to a stirring solution of 

20 diazomethane in ether at -78 C C. The cold bath was allowed 
to warm up and the mixture allowed to stir for 5 hrs at 
room temperature. The solvents and excess diazomethane 
were removed in vacuo and the residue purified by 
chromatography on silica gel using gradient elution method 

25 where the concentration of ethyl acetate ranged from 10% 
to 40 % in hexanes affording 48 mg of the diazo compound 
(60% yield) . 
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5 




10 



wherein ; 
15 n is 0-10 and 

Ar is pentachlorophenol , 2,4, 6-trichlorophenol , 2,4,5- 

trichlorophenol , or 2 , 6-dichloro-4-f luorophenol 
are prepared as follows. 

20 Methyl vanillate (0.729 g, 4*0 mmole), l-hydroxy-9- 
(2,3,4,5,6-pentachlorophenoxy)nonane (1.634 g, 4.0 mmole) 
and triphenylphosphine (1.259 g, 4.8 mmole) were dissolved 
in 20 mL dry toluene under argon. DEAD (0.76 mL, 0.836 g, 
4.8 mmole) was added dropwise, and the mixture was stirred 

25 at 25 °C for one hour. The solution was concentrated to 
half volume and purified by flash chromatography eluting 
with DCM to give 1.0 g (1.7 mmole, 43%) of the product as 
a white crystalline solid. 

3 0 The methyl ester above (1.0 g, 1*7 mmole) was dissolved in 
50 mL THF, 2 mL water was added followed by lithium 
hydroxide (1.2 g, 50 mmole). The mixture was stirred at 
25 *C for one hour then refluxed for five hours. After 
cooling to 25 'C the mixture was poured onto ethyl acetate 

35 (200 mL) and the solution was washed with 1 M HC1 (50 mL 
x3) then sat. aq. NaCl (ix 50 mL) and dried over sodium 
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sulfate. The solvent was removed and the crude acid 
azeotroped once with toluene. 

The crude material above was dissolved in 100 mL toluene, 
10 mL (1.6.3. g, 14 mmole) thionyl chloride was added, and 
the mixture was refluxed for 90 min. The volume of the 
solution was reduced to approximately 30 mL by 
distillation, then the remaining toluene removed by 
evaporation. The crude acid chloride was dissolved in 20 
mL dry DCM and cooled to -78 'C under argon and a solution 
of approximately 10 mmole diazomethane in 50 mL anhydrous 
ether was added. The mixture was warmed to room 
temperature and stirred for 90 min. Argon was bubbled 
through the solution for 10 min. then the solvents were 
removed by evaporation and the crude material was purified 
by flash chromatography eluting with 10-20% ethyl acetate 
in hexane. The diazoketone (0.85 g, 1.4 mmole, 82% over 
three steps) was obtained as a pale yellow solid. 

The following identifiers have been prepared as described 
above: 



Photolabile Cleavage 

50 Identifiers were prepared of the formula: 
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and n is 1,2,3,4,5,6,7,8,9, and 10. 

Oxidative Clea vage Type I 

7 Identifiers were prepared of the formula 




and n is 4,5,6,7,8,9, and 10* 

20 

Oxidative Cleavage Type II 

13 Identifiers were prepared of the formula 




CI 
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and n is 1,2,3,4,5,6,7,8,9,10; 
and wherein: 
3Ar is 




CI 



ci 



and n is 0,3, and 9. 



10 



15 
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It is evident from the above description that the subject 
invention provides a versatile, simple method for 
identifying compounds, where the amount of compound 
5 present precludes any assurance of the ability to obtain 
an accurate determination of its reaction history. The 
method allows for the production of extraordinarily large 
numbers of different products, which can be used in 
various - screening techniques to determine biological or 

10 other activity of interest. The use of tags which are 
chemically inert under the process conditions allows for 
great versatility in a variety of environments produced by 
the various synthetic techniques employed for producing 
the products of interest. The tags can be readily 

15 synthesized and permit accurate analysis, so as to 
accurately define the nature of the composition. 

All publications and patent applications cited in this 
specification are herein incorporated by reference as if 
20 each individual publication or patent application were 
specifically and individually indicated to be incorporated 
by reference. 

Although the foregoing invention has been described in 
25 some detail by way of illustration and example for 
purposes of clarity of understanding, it will be readily 
apparent to those of ordinary skill in the art in light of 
the teachings of .this invention that certain changes and 
modifications may be made thereto without departing from 
30 the spirit or scope of the appended claims. 
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WHAT IS CLAIMED IS: 

1. A method for recording the reaction history of a 
reaction series on each of a plurality of unique 
solid supports, wherein said reaction series involves 
5 at least two stages requiring differing agents or 

reaction conditions resulting in a different 
modification as to a plurality of said unique solid 
supports, resulting in a plurality of different final 
products on different unique solid supports f 

10 employing a combination of identifiers for recording 

said reaction history, said identifiers characterized 
by defining the choice of agent or reaction condition 
and the stage in said reaction series and being 
capable of being analyzed as to the choice and stage, 

15 said method comprising: 

reacting, at a first or intermediate stage of said 
series, a different agent or employing a different 
reaction condition with each of a group of said 
unique solid supports, said group comprising at least 

20 one of said unique solid supports, and a combination 

of identifiers wherein said combination of 
identifiers defines the choice of agent and the stage 
in said series as to each group of said unique solid 
supports, each of said identifiers being individually 

25 bound to said unique solid support directly or 

through other than a prior identifier; 
mixing said groups together and then dividing said 
plurality of unique solid supports into a plurality 
of groups for a second intermediate or final stage; 

30 and 

repeating said reacting at least once to provide a 
plurality of final products, having different 
products on the different individual unique solid * 
supports . 
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A method according to Claim 1, wherein at least 100 
unique solid supports and at least 2 groups are 
employed in each said reacting. 

A method according to Claim 1, including the 
additional stages of screening said final products on 
said unique solid supports for a characteristic of 
interest; and identifying the reaction history of at 
least one final product having said characteristic of 
interest . 

A method of Claim 1 further comprising cleaving the 
final product from the solid support and screening 
said final product. 

A method of Claim 1 further comprising treating the 
identifiers so as to detach the tag components from 
the solid supports and reacting said tag components 
with a moiety capable of detection by fluorescence or 
electron capture. 

A method of Claim 5, wherein the detaching is done 
photochemically or oxidatively and the detectable 
moiety is derived from dansyl chloride or a 
polyhalobenzoylhalide. 

A method according to Claim 5, wherein said tag 
components have two characteristics, a characteristic 
capable of separation and a characteristic capable of 
detection. 

A method according to Claim 7, wherein said 
characteristic capable of detection is the ability to 
be detected by electron capture. 
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9 . A method according to Claim 7 , wherein said 
characteristic capable of detection is the ability to 
be detected by mass spectroscopy. 



10 



10. A method according to Claim 7, wherein said 
characteristic capable of detection is radioactivity. 

11. A method according to Claim 7, wherein said 
characteristic capable of detection is fluorescence. 

12. A method according to Claim 7, wherein said tags may 
be separated by means of chromatography. 



13. A kit comprising a plurality of different separated 
15 organic compounds , each of the compounds 

characterized by having a distinguishable 
composition, encoding at least one bit of different 
information which can be determined by a physical 
measurement and sharing at least one common 
20 functionality. 

14. A kit of Claim 13 comprising at least 4 different 
functional organic compounds. 

25 15. A kit according to Claim 13, wherein said functional 
organic compounds are of the formula: 



F 1 -F 2 -C-E-C' 

3 0 where F*-F 2 is a linker which allows for attachment to 

and detachment from a solid particle; and 
C-E-C' is a tag which can be determined by a physical 
measurement. 



35 16. 



A kit according to Claim 15, wherein said functional 
organic compounds differ by the number of methylene 
groups and/or halogens, nitrogens or sulfurs present. 
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17. A kit according to Claim 15 wherein the C-E-C' 
portion can be removed photochemically . 

18. A kit according to Claim 15 wherein the C-E-C' 
5 portion can be removed oxidatively, hydrolytically , 

thermolytically, or reductively. 

19. A solid support characterized by having a ligand 
bound thereto and having a combination of identifiers 

10 bound to said solid support. 

20. A solid support according to Claim 19, wherein said 
ligand is an oligomer which is an oligopeptide, 
oligonucleotide, oligosaccharide, polylipid, 

15 polyester, polyamide, polyurethane , polyurea, 

polyether, poly (phosphorus derivative) which is a 
phosphate , phosphonate , phosphor amide , 
phosphonamidey, phosphite, or phosphinamide, poly 
(sulfur derivative) which is a sulfone, sulfonate, 

20 sulfite, sulfonamide, or sulfenamide, where for the 

phosphorous and sulfur derivatives the indicated 
heteroatom for the most part will be bonded to C, H, 
N, 0 or S, and combinations thereof. 

25 21. A solid support according to claim 19 wherein said 
ligand is a non-oligomer which is heterocyclic, 
aromatic, alicyclic, or aliphatic, and combinations 
thereof. 

30 22. A solid support of Claim 21 wherein the non-oligomer 
is a diazabicyclic, an azatricyclic, or a branched 
amide compound. 



35 



23. 



A solid support of Claim 19 wherein the ligand is 
linked to the support through a non-labile linkage. 
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24. A solid support of Claim 19 wherein the ligand is 
linked to the support through a cleavable linkage. 

25. A solid support according to Claim 19, wherein the 
5 identifier comprises tags, the tags being 

radioisotopes, or haloalkyl or haloarylallyl 
containing compounds. 

26. A solid support of Claim 19 which is a bead of about 
10 10-2000 fim in diameter, and wherein the identifiers 

comprise tag components which after cleavage from the 
bead can be separated by gas chromatography and or 
liquid chromatography detected by electron capture, 
mass spectroscopy, fluorescence, or atomic emission 
15 techniques. 

27. A library comprising a plurality of solid supports 
according to claim 22. 

20 28. A library of Claim 27, wherein the final products 
have been cleaved from the solid support. 

29. A library of Claim 28, wherein the final products are 
a diazabicyclic, azatricyclic, or branched amide 

25 compounds. 

30. A process for identifying compounds having a 
characteristic of interest which comprises screening 
a library of Claim 27. 

30 

31. A process of Claim 30, wherein the compounds have 
been cleaved from the solid surface. 

32. A process of Claim 31, wherein the compound is a 
35 diazabicyclic, azatricyclic, or branched amide 

compound . 
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A method for producing a ligand involving a reaction 
series employing a method for recording the reaction 
history of a reaction series on each of a plurality 
of unique solid supports, wherein said reaction 
series involves at least two stages requiring 
differing agents and/or reaction conditions resulting 
in a different modification as to a plurality of said 
unique solid supports, resulting in a plurality of 
different final products on different unique solid 
supports, employing a combination of identifiers for 
recording said reaction history, said identifiers 
characterized by defining the choice of agent or 
reaction condition and the stage in said series and 
being capable of being analyzed as to the choice and 
stage, said method comprising: 

reacting, at a first or intermediate stage of said 
series, a different agent or employing a different 
reaction condition with each of a group of said 
unique solid supports, said group comprising at least 
one of said unique solid supports, and a combination 
of identifiers wherein said combination of 
identifiers defines the choice of agent and the stage 
in said series as to each group of said unique solid 
supports, each of said identifiers being individually 
bound to said unique solid support directly or 
through other than a prior identifier; 
mixing said groups together and then dividing said 
plurality of unique solid supports into a plurality 
of groups for a second intermediate or final stage; 
repeating said reacting at least once to provide a 
plurality of ligands, having different products on 
the different individual unique solid surfaces; and 
identifying said reaction history of at least one 
selected unique solid surface by means of said 
combination of identifiers. 



-132- 



PCT/US93/09345 



A ligand according to Claim 33, wherein said 
identifying includes the stage of screening said 
ligands for a characteristic of interest. 

A method for producing a ligand involving a reaction 
series employing a method for recording the reaction 
history of a reaction series on each of a plurality 
of unique solid surfaces, wherein said reaction 
series involves iat least two stages requiring 
differing agents and/or reaction conditions resulting 
in a different modification as to each of a plurality 
of said unique solid surfaces, resulting in a 
plurality of different ligands on different unique 
solid surfaces, employing combinations of identifiers 
for recording said reaction history, said combination 
of identifiers characterized by defining the choice 
of agent and/or reaction condition and the stage in 
said series and being capable of being analyzed as to 
the choice and stage, said method comprising: 
reacting, at a first or intermediate stage of said 
series, a different agent and/or employing a 
different reaction condition with each of a group of 
said unique solid surfaces , said group comprising at 
least one of said unique solid surfaces, and a 
combination of identifiers wherein said combination 
of identifiers defines the choice of agent and the 
stage in said series as to each group of said unique 
solid surfaces, each of said identifiers being 
individually bound to said unique solid surface 
through other than a prior identifier by a cleavable 
link; 

mixing said groups together and then dividing said 
plurality of unique solid surfaces into a plurality 
of groups for a second intermediate or final stages- 
repeating said reacting to provide a plurality of 
ligands having different ligands on the different 
individual unique solid surfaces; 
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screening the ligands from a plurality of each of 
said unique solid surfaces for a characteristic of 
interest ; and 

identifying said reaction history of at least one 
selected unique solid surface having ligand having 
said characteristic of interest by detaching the tag 
members from said unique solid surface and 
identifying said tag members by means of a differing 
characteristic . 

A method according to Claim 35, wherein said tags 
differ in an homologous series and are detected by 
electron capture gas chromatography or mass 
spectroscopy. 

A compound of the Formula I: 

F 1 -F 2 -C-E-C' I 
where F^F 2 is a linker which allows for attachment to 
and detachment from a support? and 
C-e-C' is the tag which is capable of analysis; 
E is a tag component which allows for detection, or 
allows for detection and provides for separation as 
a result of variable substitution? 

C and C' are tag components which allow for 
individual detection; 

F 2 is a linking component capable of being selectively 
cleaved to release the tag components? and 
F 1 is a functional group which allows ready attachment 
of the compound to a synthesis support- 
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38. A compound of Claim 37 having the formula: 

F 1 -F 2 -(C(E-C') e ) b 

wherein : 

F 1 is C0 2 H, CH 2 X, NR 1 R\ CfOJR 1 , OH, CHN 2> SH, C(0)CHN 2 , 
S(0 2 )C1, S(0 2 )CHN 2 , N 3 , N0 2 , NO, S(0 2 )N 3 , OC(0)X, 
C(0)X, NCO, or NCS; 
F 2 is A _ 




— NC(0J0 — , CK ICR'jfe . — CR*= CR* C(R* ), — , 



T~ °y B ' — Si(CH 3 ^— (CH' 2 s— a— ,r-(Cfi- 2 feA— 

■—o 

_0^a-. -ch'— cL-cff't-A-. ~©C*_ 
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— S— W\ A— • — C(J)X- — C(R) 2 A— . 
— CCOHJR 1 C(RV— . — C(0H)R— C(CE 2 XJR— . 



— C(OB)R^— CfR 1 ) — C(X)B— . — CtOHKCHjjCHgX)— . 
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with the proviso that when F 2 is a bond, F 1 is OH or 
COOH ; 

A is -0, -OC(0)0-, -OC(0)-, or -NHC(O)-; 
£ is a bond, C,-C 20 alkylene optionally substituted by 
5 1-40 

F, CI, Br, C,-C 6 alkoxy, NR*R*, OR*, or NR A , or 

-[ (C(R*) 2 ) m -Y-Z-Y-(C(R*) 2 ) n Y-Z-Y] p -; with the proviso 

that the maximum number of carbon atoms in C+C' is 

20; 

10 CI is H; F; CI; C^C 2Q alkylene optionally substituted 

by 

1-40 F, CI, Br, C 1 -C 6 alkoxy, NR 4 R 4 , OR A , or NR 4 , or 
-[ (C(R 4 ) 2 ) m -Y-Z-Y-(C(R 4 ) 2 ) n Y-Z-Y] p -; with the proviso 
that the maximum number of carbon atoms in c+c' is 
15 20; 

E is C^C^ alkyl substituted by 1-20 F, CI or Br; or 
Q-aryl 

wherein the aryl is substituted by 1-7 F, CI, N0 2 , 
S0 2 R 5 , or substituted phenyl wherein the substituent 
20 is 1-5 F, CI, N0 2 , or S0 2 R 5 ; 

E-C' may be -H, -OH, or amino; 
R 1 is H or C 1 -C 6 alkyl; 

R 3 is C=0, C(0)0, C(0)NR 1 , S, SO, or S0 2 ; 
R 4 is H or C 1 -C 6 alkyl; 
25 R 5 is C 1 -C 6 alkyl; 

a is 1-5; 
b is 1-3; 

m and n is each 0-20; 
p is 1-7; 

30 Q is a bond, O, S, NR*, C=0, -C(0)NR 5 , -NR 5 C(0)-, - 

C(0)0-, 
or -OC(0)-; 

X is a leaving group such as Br, CI, triflate, 
mesylate, 
35 tosylate, or OC(0)OR 5 ; 

Y is a bond, 0, S, or NR 4 ; 
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Z is a bond; phenyl optionally substituted by 1-4 F, 
CI, Br, c r c 6 alkyl, C r c 6 alkoxy, c r C 6 alkyl 
substituted by 1-13 F, el, or c,-C 6 alkyloxy 
substituted by 1-13 F, CI, or Br; (C(R*) 2 ) or 
5 (CF 2 ) , 20 ; with the proviso that when Z is a bond one 

of its adjacent Y's is also a bond and aryl is a 
mono- or bi-cyclic aromatic ring containing up to 10 
carbon atoms and up to 2 heteroatoms selected from O, 
S, and N. 

10 

39. A compound of Claim 38 wherein: 
F 1 is 



C0 2 H. OH. CHK 2 . C(0)CHU 2 . C(0)X. KCS. or CH 2 X: 




£ and C' is each independently C,-^,, alkylene 
unsubstituted or substituted by 1-40 F or CI, or [0- 

(ch 2 ) 2 . 3 ] p ; 
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E is C^-C^ alkyl substituted by 1-20 F or CI; Q-aryl 
where aryl is a bi-cyclic aromatic ring substituted 
by 1-7 F or Cl; or Q-phenyl substituted by 1-5 F, Cl, 
N0 2 , or S0 2 R 5 ; and 
5 Q is a bond, 0, -NR 5 C(0)-, or -0C(O)-. 



41. A compound of Claim 38 having the formula: 
N0 2 




WO 94/08051 



-139- 



PCT/US93/09345 



'wherein Ar is pentafluoro- pentachloro-, or 
pentabromophenyl , 2,3,5, 6-tetraf luoro-4 (2,3,4,5,6- 
pentaf luorophenyl ) phenyl , 2,4, 6- trichlorophenyl , 
2,4, 5-trichlorophenyl , 2 , 6-dichloro-4-f luorophenyl , 
5 or 2 ,3, 5 , 6-tetraf luorophenyl. 

42. A compound of Claim 38 wherein: 
E-C' is H, OH, or NH 2 . 

10 43. A composition of the formula 

S-F 1 '-F 2 -C-E-C' 

wherein: 

S is a soluble or solid support; 

C-E-C'is the tag which is capable of analysis where 
E is a tag component which (a) allows for detection, 
such as an electrophoric group which can be analyzed 
by gas chromatography or mass spectroscopy or (b) 
allows for detection and for separation as a result 
of variable substitution; 

C and C' are tag components which allow for 
distinguishing one tag from all other tags, usually 
allowing for separation as a result of variable 
length or substitution, for example, varying the 
chromatographic retention time or the mass 
spectroscopy ratio Z/e; 

F 2 is a linking component capable of being selectively 
cleaved to release the tag; and 

F 1 ' is a functional group which provides for 
attachment to the support. 

44. A composition of claim 43 wherein: 

S is a capillary, hollow fiber, needle, solid fiber, 
cellulose bead, pore-glass bead, silica gel, 
polystyrene bead optionally cross-linked with 
divinylbenzene, grafted co-poly bead, poly-acrylamide 
bead, latex bead, dimethylacrylamide bead optionally 
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cross-linked with N,N'-bis-acryloyl ethylene diamine, 
glass particles coated with a hydrophobic polymer, or 
low molecular weight non-cross-linked polystyrene; 
and 

F^-F^C-E-C' is the residue of Formula I attached to 
S. 

45. The method of Claim 1, wherein the combination of 
identifiers defines a binary coding scheme. 

46. The method of Claim 1, wherein the identifiers are of 
Claim 37. 

47. The method of Claim 1, wherein the identifiers are of 
Claim 38. 

48. The method of Claim 1, wherein the identifiers are of 
Claim 39. 

49. The method of Claim 1, wherein the identifiers are of 
Claim 42* 

50. The method of Claim 1 further comprising detaching 
the tag members from said unique solid surfaces. 

51. The method of Claim 50 wherein the tag members are 
detached photochemically , oxidatively, 
hydrblytically, thermolytically , or reductively. 

52. The method of Claim 1 further comprising detaching 
non-oligomer ligands from said unique solid surfaces 
photochemical ly . 



WO 94/08051 



-141- 



PCT/US93/09345 



53 . A compound of the formula 




wherein: 

P is a polystyrene resin; 

IX . f is a plurality of residues of the formula 




wherein: 
n is 1 - 6; 

R is CH 3 , CH(CH 3 ) 2 , OLjCO^, (CH^NHj, CHj-C^-OH, or 

CH 2 C 6 H S ; and 
R 1 is H, CH 3 , CjHjj, CHgCH-CHg, or CH^Hj. 

54. A method of synthesizing a chemical compound so that 
the structure of the compound is readily 
determinable, which comprises synthesizing the 
compound on the surface of a solid support under 
conditions such that the solid support at the 
completion of the synthesis of the compound has bound 
to it a plurality of identifiers which encode the 
reaction stages associated with the synthesis of the 
compound. 
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55. A method of synthesizing a library of chemical 
compounds so that the structure of each compound in 
the library is readily determinable which comprises 
synthesizing each compound on the surface of a unique 
solid support under conditions such that each such 
unique support at the completion of the synthesis of 
the library of compounds has bound to it a plurality 
of identifiers which encode the reaction stages 
associated with the synthesis of the compound 
synthesized on such solid support. 

56. A method of determining the structure of a chemical 
compound which comprises synthesizing the compound by 
the method of claim 54 or 55, isolating the solid 
support upon which the compound was synthesized, 
treating the solid support so isolated so as to cause 
the tag components of each of the identifiers bound 
to the solid support to be released, determining the 
identity or quantity or both of each tag component so 
released, and deriving the structure of the compound 
from the identities or quantities or both of all such 
tag components. 

57. A method of identifying a compound having a desired 
characteristic which comprises synthesizing a library 
of chemical compounds by the method of claim 55, 
separately testing each of the compounds in the 
resulting library in an assay which identifies 
compounds having the desired characteristic so as to 
identify any compounds present in the library which 
has the desired characteristic. 

58. A method of claim 57, further comprising determining j 
the structure of the compound so identified, 

59. A library of chemical compounds, each compound in the 
library being bound to a unique solid support and 
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each such solid support having bound to it a 
plurality of identifiers which encode the reaction 
stages associated with the synthesis of the compound 
bound to such solid support. 

60. A library of claim 59, wherein compounds in the 
library are diazabicyclic compounds. 

61. - A library of claim 59, wherein compounds in the 

library are azatricyclic compounds. 

62. A library of claim 59, wherein compounds in the 
library are branched amide compounds. 

63. A library of claim 59, wherein compound in the 
library are peptides. 

64. A method of identifying a compound having a desired 
characteristic which comprises testing a library of 
chemical compounds according to claim 58 in an assay 
which identifies compounds having the desired 
characteristic so as to identify any compound present 
in the library which have the desired characteristic. 

65. A method of claim 64, further comprising determining 
the structure of the compound so identified. 

66. A compound identified by the method of claim 63. 

67. A method of claim 64, wherein the desired 
characteristic is antagonism for the human neurokinin 
1/brandykin receptor and the library of chemical 
compounds comprises azatricyclic compounds. 

68. A method of claim 64, wherein the desired 
characteristic in usefulness as a muscle relaxant, a 
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tranquilizer or a sedative and the library of 
chemical compounds comprising bezodiazopines. 

69. A method of claim 64 , wherein the desired 
characteristic is useful in the treatment of 
hypertension or Raynaud's syndrome and the library of 
chemical compounds comprises branched amides. 
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